Get around, it's time to cheer. Episode 93 is here. Open eyes, sail ship, NASA water site. Gemini 2 shines bright in the starry night. AI agents surf the web. Browsing also free.
Hello and welcome to the last week in AI podcast where you can hear us chat about what's going on with AI. As usual in this episode we will be summarizing and discussing some of last week's most interesting AI news and as always you can also go to lastweekin. ai for our text newsletter and you will also find the links to all these stories there and in the description of this episode. I am one of your hosts, as always, Andrey Kurenkov.
My background is of having studied AI in grad school and now being at an AI startup.
And I'm your other host, Jeremy Harris. Um, obviously, you know, CloudStone AI, we've talked about that a bunch of times. You know who I am, hopefully. Um, what you may not know is that, uh, I just moved. And so you're going to hear some echoes. Unfortunately, my end, I wasn't able to find a room that doesn't have. intense echo and uh, my newborn daughter is being taken care of by my uh, saintly wife in the next room and downstairs.
So I just wanted to get out of the way but that does mean that uh, unfortunately for this week you gotta put up with an echo and you gotta put up with the fact I don't have curtains in this room. So if you're watching this on YouTube, uh, that is why my face is is just inundated, bathed in the sun's sweet, sweet rays. And that is my,
we're doing what we can with the life circumstances we're given, you know, and it'll be an interesting opportunity to test out the latest in AI audio enhancement. Adobe, we didn't cover this, but it'll be did release a new iteration of their, uh, audio podcast, uh, tool to take noisy audio and make it nice sounding. So, you know, you never know, maybe there's going to be no echo if it actually works really well. Yeah,
that's right. Hopefully people are listening to this and being like, wait, what the, like, what are they even talking about? And, uh, we'll of course use our, our, um, uh, AI powered content improvement as well, which will completely replace every word that I say with something actually insightful. Yeah,
exactly. People give us good reviews and they don't know that it really is mostly the AI doing the work. It's not us. Well,
Andre, as a large language model trained by OpenAI, I can't respond to that comment in a direct way. But I will tell you not to bury dead bodies or make bombs at home. So,
I mean, that's what you got to do if nothing else, right? All right. Well, let's do a quick preview of what we'll be talking about in this episode. We have some major stories on the tools and applications front. It's been a big week at OpenAI, releasing Sora and a bunch of other stuff. And Google also has Gemini 2. 0, some agent announcements.
And then moving out to applications and business, as always, some exciting developments at OpenAI and a bunch of stuff going on with data centers, as has been the story of the past, I don't know, six months or so, or maybe an entire year. Research and advancements. We have some pretty cool new ideas going on in reasoning and in the memory. Uh, Sort of, uh, yeah, it'll be interesting if you like the technical side and then policy and safety.
We have some new developments on the Trump administration and, uh, as always a bit of stuff on the U S China relationship and a bit of a mix of stuff in general. But before we get there, our usual prelude, do you want to do a quick shout out to some feedback? We've gotten, uh, some interesting new reviews on Apple podcasts. There was, uh, the most recent one, I'm betting you haven't seen this Jeremy. It's the title is what a great find.
And then funny story, my wife, SCPA and this listener just listened to your podcast for 22 hours. Broke broken up into two days while driving out and back to India for Thanksgiving. So that's impressive.
I mean, uh, I guess if you really want to know what happened in the past year of Uh AI developments that will do it I don't know if I could listen to 22 hours of us over the course of two days, but Uh, we were honored I guess And then just one more I want to shout out, uh, there's another cool review mentions that this reviewer is on the
industry side and has used machine learning for a long time in the non IT sector and has actually read some of the papers that, uh, has been covered here on the podcast. Which is cool. I don't know how many people listening are more technical and do find the papers interesting or go on to read them. But I guess that is one of the things we like to cover. And this does have an interesting question we can maybe chat about a little bit.
Uh, this viewer says that they don't quite believe in sort of AI doom scenarios, but do like to hear about the developments. And there's a question here on, If erring on the side of caution can lead to negative outcomes, uh, how, for instance, Greenpeace and anti GMO groups opposed, uh, Golden Rice and that caused to a bunch of negative outcomes after that.
Yeah. No, I mean, I think that's, it's just, it's just a good question. Just a good point. You know, history is full of examples of, of things in both these directions. Um, and for, for many different reasons, right? Like one, one thing you can do is, Pull the the fire alarm too early, for example, and stifle a field just when it's at its inception, right? That would be a big, big problem.
And certainly we've seen like the enormous benefits from things like open source in in a I and historically in software in general, you know, you want to think very, very hard about how you do that. And then there's also just like.
If you, if you're concerned about AI to the point where, you know, you're like, I think they, like the U. S. government, Department of Defense, whatever, uh, the intelligence community should not be accessing these tools, you know, then there's, you know, we have adversaries who are, who are going to do this. And so, um, it's a very complex space. It's difficult to, to know what to do. There's also examples of the kind of flip side of this, right?
So, for example, a traditional one is, is nuclear weapons where, you know, you think back to the kind of late thirties. early 40s even and a lot of nuclear research that was deeply and profoundly relevant to the weaponization of, um, of nuclear, uh, technology was out in the open. In fact, there was a very public argument that, uh, you should not lock things down. In fact, the space continued as an open, a field of open research.
Um, some have argued, uh, long sort of after it should have been locked down. Um, so, so it's sort of figuring out what the right analog is historically for, for AI is really hard. It depends on how seriously you take, you know, the risk of Chinese exfiltration of powerful models, uh, the risk that the models can be weaponized with catastrophic impact. The risk of the models may autonomously have catastrophic effects.
All this stuff, you know, feeds into everybody's, uh, everybody's perspectives on this. And then there's a whole bunch of like, the future is just hard to predict stuff going on here too. So, um, no, I think it's just a good question. And I wish I had all the answers. I don't think anyone really does. And the key is you got to keep all these things in your head at the same time.
Exactly. And, uh, I guess. Worth also mentioning, I forget where this comment was posted, but someone did post, kind of mentioning that we could be a little more, uh, providing a little bit more of an international perspective. I think it's pretty clear that we are covering this from the, let's say, Western US Canada perspective and often do portray China gaining advanced air capabilities, that's not necessarily what we want.
So let's, yeah, calling out, we do kind of have some opinion on that front. We try to both objectively cover the news and give our take, which typically is, uh, especially for Jeremy, I think on the concern about China front. So, um, yeah, like, uh,
yeah, on that note, I think just like to share and be open about my In my perspective on this and you can factor it in any way you want, right? That's the beauty of, uh, of the, the podcasting scene. But, um, yeah, I mean, I, I think, uh, the world is much better off personally. This is just my view with the United States well ahead on, on the AI side of things. I think the CCP is, uh, in my opinion, a very dangerous force in the world.
And, uh, We need to find ways to counter, especially their use of advanced AI for military and other applications. Um, and I think that they're highly competent at things like exfiltration and, Um, you know, there, anyway, this is all, it quickly devolves into like, uh, how, how you view different, different powers in the world. And I do think there are very objective reasons for thinking that the CCP is not the best friend of the Chinese people and certainly not the best friend of the West.
And, and that's, uh, you know, that's a stance that I tend to take. Um, you can, you know, You can factor in your own priors, however you like, but I think that's the end of the day.
And I think we do want to be careful when we do portray, uh, China as maybe not necessarily a positive force that is very much about the CCP, right? Because, uh, Certainly being in grad school, I know many people from China, you know, a lot of research comes out of China. This is not about people who are Chinese, it's about the government and what may happen if they utilize AI in, let's say, nefarious ways. Alrighty, and then just one more thing before we get to news.
As usual, we do need to shut out our sponsor and has, has been the case for a little while now. The sponsor for this week is the generator, which is the interdisciplinary AI lab focused on entrepreneurial AI at College, which is a number one school for entrepreneurship for over a category of, uh, over 30 consecutive years in the U S and what happened was last year, various professors in Bobson partnered with students to launch this new program.
interdisciplinary lab that does various things like focusing on AI entrepreneurship and business innovation, AI ethics in society, the future of work and talent, AI arts and performance, and other things. So they look into a lot of sort of emergent trends. They train the faculty of Bobson to be aware of AI concepts and AI tools, which I suppose if you are an entrepreneur, you certainly want to be.
on the cutting edge of using at least ChazGPT, Perplexity, you know, the tools that make you more productive. So, uh, yeah, once again, uh, shout out to them and, uh, and thank you for your sponsorship. All right. And getting into the tools and apps, the first story is what I would say is probably the big story of the week, which is the launch of Sora. So Sora, the text to video AI model from OpenAI.
first kind of teased at the beginning of 2024, I think one of the big kind of starter events for AI in this year. Well, it took a while, but now you actually can access it and can use it as a tool, uh, assuming that the website is up. Actually, it was so popular that a chat GPT went down, which was a little bit annoying as someone who uses their API. So this is a pretty full featured, uh, kind of, uh, consumer product is what turned out to happen.
There's a website you can go to, and then there's quite a bit of user interface to it. So there is the basic of you give it a text and you generate a video. Cool. But they have a more advanced, uh, kind of tool set of being able to have a timeline of videos. Uh, and in addition to that, they also have an explore page with community generated videos and a lot of, uh, the ability to browse. Uh, what various people have made and as we would sort of kind of expect videos look really nice.
Uh, I don't know what they look, let's say like Sora 2. 0. If you look back on the beginning of the year and look to now, it's not a leap. Like you still, you know, It's like similar, I would say, to what we've been seeing with text to video and you still see a lot of those common artifacts.
So as a portrayal of an AI world model, it's certainly not the case that Sora has kind of solved the world model problem of not having weird hallucinations happen when you get into gymnastics or other tricky things like that. But it is pretty impressive, of course.
Yeah, and we're getting a couple of insights from the system card that was published in terms of what the model actually consists of. Um, it is a diffusion model, so that's good to know. So you start off with a base video, um, that's a bunch of noise and you gradually remove that noise over many steps, right? That's the diffusion concept. Go from noise to information. Um, and that's kind of the training process.
Um, what they they say is that they give the model foresight into like many frames at a time. So the model is not just looking at one frame and then trying to kind of, um, do diffusion just based on that still image and ensure consistency with with another. What they're doing is they're giving him many frames at the same time, which allows the model to do things like capture object permanence, right? This idea that if you take an object, you move it out of sight and then you move it back in.
You know, you want the model to retain a sense that, hey, that object still exists, right? So it's not just going to like, like warp out of space. You know, for a classic example is, you know, you look at a, say a painting on a wall, then you, you move the camera's perspective a little bit off the wall. You don't see the painting anymore. And then when you move it back to the wall, the painting's gone, right? That's a lack of object permanence. That's typical of these sorts of models.
They're trying to deal with this. By again, training the model, see many, many different frames at the same time. So it can learn this idea of object permanence, among other things that improve coherence, um, we do know that it's a transformer, uh, so that's been done presumably for the sort of scaling, uh, properties of transformers. Certainly OpenAI seems keen on, on applying the standard scaling recipe and strategy to Sora, so we'll probably keep seeing more versions of Sora.
And we do know from their blog post, uh, that it uses the recaptioning technique from Dolly Free. So they describe this as generating highly descriptive captions for the visual training data. Basically, imagine you have an image, generate, like, images, Very, very long caption that in great detail captures what's in there, uh, in order to allow your model to develop more of a sort of, um, conceptual semantic understanding of what is contained in a richer way in that image.
Um, we know that they're using space time patches. We talked about the idea of space time patches, uh, previously, right? This idea that you've got essentially like a, a cube. So if you look at a still image, right, you can cut out a little square from it. Um, but then that still image, if it's in a video, uh, there's going to be another still image before and after and a whole bunch stacked like that.
So you can imagine taking a little patch of that image and extending it in, in the time domain. And now you have kind of a space time Chunk and they're going to use those as they call them patches, but essentially that they're going to transform the videos into compressed latent representation from which they can pull these space time patches. So that's basically the architectural details that we currently have. It's pretty hand wavy at the moment.
Um, it, uh, it does make me think a little bit, and we talked about this back in the day, I think it was VJEPA that Meta came out with. Um, and they were a little bit more open about, obviously, their architecture there, because they tend to do open source stuff. Um, but that is what we know so far, so emphasis on scaling presumably continues with this, and there's a whole bunch of information. About the red teaming process, as you might imagine, right?
Some concern over, in particular, how this kind of tool could be used for persuasion, right? Generating, you know, fake news or, or, or whatever. Any kind of persuasive content. Um, and they do flag that as a thing that, you know, their reviews suggest is a risk with this, obviously not surprising. Um, but they do also say, you know, there's no evidence that it poses risk with respect to the kind of, um, The rest of the open AI preparedness eval.
So cybersecurity, chem, biological, radiological, nuclear risk model, autonomy, just as you'd expect, right? You're not going to have a video generation model that poses a cybersecurity risk. Fair enough. Um, but they do flag that, um, you know, you could look at things like, um, impersonation, misinformation or social engineering. And they teaming activities. They talk about, you know, they brought folks in to do it.
Um, they, they claim 15, 000 generations, uh, between September and December 2024. So this also is a bit of a more robust evaluation process, or at least in time, it seems to have been. Then, for example, what we saw with, uh, the release of O1 Preview and O1 Mini a couple months ago when OpenAI was criticized for essentially tossing the model at Uh, eval companies like meter and saying, Hey, you got like a week to do your, all your evals.
And you know, the, uh, anyway, all the problems that come from that. So here it seems to have been a much more, uh, sort of patient process. Uh, again, 15, 000 generations, it's hard for me to evaluate. Like what that really means. Is that enough? Is that, uh, is that, uh, too much. Um, but, uh, yeah, I mean, it seems like they've, uh, they've at least paid attention to this as they do with their other models. And the report is kind of interesting there too.
Right. And they do highlight on that safety front, uh, that you'll get watermarks, uh, if you're not on the pro tier. So talking about the subscription side, the charity plus subscribers, uh, 20 per month tier can generate up to 50. Priority videos and up to, uh, 720p resolution with five second duration. And then in the new chat, jeopardy pro tier of 200 per month, you can generate up to 500 priority videos. And up to HD resolution, 20 seconds duration, five concurrent generations.
And you can download it without a watermark. So this is an example, I think, where that 200 per month subscription tier actually gets you quite a bit if you care about generating lots of videos. And in addition to a watermark that some videos will have, they also do have the metadata of C2A, the Coalition for Content Provenance and Authenticity, which has some of these things that AI outputs increasingly have to be able to verify whether it's AI or not.
In addition to the usual text image, they do have some Of the other kinds of capabilities with, for instance, remixing, uh, changing a video based on some prompt and, um, it can take, uh, about up to one minute to generate. So it's not quite real time, takes a bit longer than a lot of what we've seen, but it is relatively SORA. Turbo model powering this, that is presumably much faster than what they had at the beginning of the year. So overall, quite a rollout for Sora.
I think I was surprised by the degree to which they had a pretty, uh, sophisticated, uh, platform. Tool with the storyboards with, you know, a whole other UI website, uh, subscription tiers, a lot of stuff going on here. And if you're in the U S if you're in a lot of countries, you can now go ahead and try it out. Except if you're in the UK and EU, uh, it seems that that's isn't the case. Sam Altman said it may take a while to launch there.
So, uh, speaking of the negatives of taking AI safety seriously, I mean, where do you go? You're not going to get a lot of stuff in Europe, uh, at the same time as in the US. Yeah. I
mean, I think part of that too, is just that the EU AI. legal domain is like, it's not just, it's not the fact that they're taking AI safety seriously. I think it's the fact that they are a bloated European, um, uh, you know, government organization that like a GDPR was the same, right? There are much better ways to deal with. If you're, if you're, uh, sort of like ad problem than, than having pop ups or sorry, your privacy problems and having pop ups, uh, every time you visit a goddamn website.
Um, so yeah, I mean, I think this is a lesson in, uh, be careful about how you expand government because, uh, yeah. No Sora for you.
And moving on to, I would say the other big story of a week, uh, coming from Google, they have revealed Gemini 2 and a bunch of associated stuff related to Gemini 2. So starting with Gemini 2, they have a Gemini 2. 0 Flash, which is a successor to 1. 5 Flash. And, uh, I mean, the benchmarks here are pretty surprising. They say that it can outperform, uh, Gemini 1. 5 Pro on various benchmarks and is twice as fast.
It supports multi modal inputs like images, video, and audio, and it Now supports multi model output like images and then mixing that with text and text to speech audio. Although I'm not sure if that's launched as a capability yet. It is built into Gemini 2 Flash. It also has a tool use like Google search and code execution. And then in addition to all of this, uh, by the way, Gemini two is already available. So it's not one of these announcements, uh, where it's not going to come for a while.
You can access the chat optimized version of 2. 0 flash experimental, uh, in the Gemini app. You can select it as a model. And, uh, they do say that they are going to go to agents with Gemini 2. 0. They have this whole kind of little demonstration, uh, where there's an update to project Astra that, uh, is there sort of, uh, prototype with a universal AI assistant, and they have Project Mariner, which is an AI agent for controlling your browser.
And also another thing, Drools, an AI powered code agent that can help developers. Uh, so a lot of stuff going on here, uh, but I think Gemini 2 Flash, uh, the main thing was, uh, I think pretty, pretty cool. It sounds. Pretty impressive at least according to your benchmarks.
Yeah, it sort of, uh, reminds me of a bit of a reverse, uh, OpenAI play where often when Google has a big event coming up, you know, OpenAI will try to scoop them the day before, do a big launch. Everybody's talking about the OpenAI launch, the Google thing.
Um, this is a little bit, in a sense, uh, sort of the, the script being flipped where you have OpenAI in the middle of their 12 days of ship ness and, um, launching a whole bunch of stuff, including Sorrel, which you just talked about, and now Google coming out with, you know, this pretty interesting development. I, I gotta say, I mean, this is pretty interesting.
You know, we, we first started looking at companies that were heading in this direction, i. e. the agentic, uh, tool use and tool use on behalf of users direction with adept AI was sort of the first time, you know, there was a significant investment this direction. Obviously, we talked about how they were maybe too, uh, too small to succeed, um, and, uh, then ultimately, you know, ended up getting, getting sold depending on who you ask for parts. And, uh, here, here's Google actually kind of.
With with a big push in this direction reminds you a lot of anthropic and their efforts that are quite similar actually. Um, and now you're gonna start to see the base models as well being trained with agentic potential in mind, right? That's really what this is. You're no longer viewing these as just chatbots.
Uh, really the, the train, the training regime, uh, the synthetic data, the, you know, the way you approach PPO, the way you approach fine tuning is, is entirely going to be geared toward. Increasingly towards agentic potential. And that's I think what you're starting to see here. You know, they started these training runs thinking about how are these models going to serve as agents.
And, um, and it does seem based on the demos they discuss, uh, in at least this announcement, there's some pretty impressive stuff, right? Always hard to know from demos, how generalizable they'll be.
Um, but, uh, So Project Mariner is this experimental Chrome extension they have right that's able to take over your web browser and do supposedly useful chores and they report on a wire does here on a particular example, they say that they had an agent that was asked to plan a meal and it goes to the supermarket Sainsbury. If you're not in the UK, then that's that's going to be a weird, weird chain for you. But you know, Um, there you go.
They go to UK trader, Joe's, they log in, uh, to user's account, add relevant items to their shopping basket. When certain items were unavailable, the model chose suitable replacements based on its own knowledge about cooking, right? World models coming in handy here. And, um, the, the last sentence though, of course, as ever is fairly indicative, Google declined to perform other tasks, suggesting it remains a work in progress. So. You know, again, very fragile demos, uh, is definitely a thing.
So you want to, you want to be a little careful when evaluating these things, but just to give you a sense of at least the strategic direction they're heading in. You know, they're, they're saying this is a research prototype at the moment, right? It's not meant to be the leading product line, but this is really where things are going. Right? Like it is sonnet 3. 5 new. It is, uh, you know, Gemini to sort of Project Mariner agents. This is where we're going certainly into 2025.
We may be starting to see stuff like that in the next, I guess, nevermind the next two weeks. Now it's going to be, you know, really a 2025 story, but I think things are going to move really fast. There are a number of reasons why I think agents are really poised to make breakthroughs. A lot of yeah. Interesting scaling results and things like that. We'll talk about in the research section, one big paper in particular, but I think this is a harbinger of really big things to come.
Yeah, I totally agree. I think agents to a large extent are more of an engineering challenge now than a conceptual research problem. And this could be a very kind of important thing for Google because we've seen they haven't quite been able to overtake open AI and entropic. And the race to have the best models, the best frontier models.
In fact, we've talked about this, I think a couple of episodes ago, when you use Gemini as a user of cloud or chat, UBT as, as I am at least, uh, it tends to be a little disappointing in terms of its reasoning techniques and, uh, you know, just its overall intelligence. Hopefully Gemini 2. 0 helps with that. But if Google is able to augment AI assistant, it's AI assistant that is built into the Android phones, right, to have a big leg up, as you always say, Jeremy, distribution is key.
So everyone is fighting to get an agent in your hand that will be sort of your personal assistant and Google. Has phones and browsers that people use that if they have a good enough version, people will just probably default to using those agents, which, uh, is yeah, agents. I think we are on the side of the predictions that most people will be using AI agents as an everyday thing, uh, not too long from now. So, very important initiative for Google.
All right, then a couple more big things, although we'll try to start going a bit faster. We've spoken at length about these last two. The next one is another development from the Shipmas, uh, as they've been calling it at OpenAI. So we have a bunch of new stories. We'll cover at least a few of them. The next one that we are covering is the ChadsGPT advanced voice mode, adding video and screen sharing input.
So, uh, we saw this originally in the demos that went back, I think to May, where when you were talking to ChadsGPT, uh, live, you could also show it a stream of video, show it some equations, ask it about those equations, and it could give you that serve that. Was not part of a launch of advanced voice mode, and now it is. Now you can do that thing they demoed back originally.
In addition to that, they also added a fun little Santa mode, which has a new voice option and a snow globe themed theme. interface. So, uh, yeah, uh, kind of, uh, they are shipping a lot and shipping kind of various levels of excitement. I think this is a fairly big deal, but certainly not like solar level.
Yeah. I mean, you know, especially on the sort of application side, um, I won't pretend to like know anyone. At open AI working on like just the applications of product side, but my guess is that the advanced voice mode rollout being so delayed, right? That was a big theme. People are kind of getting frustrated about, um, it sort of reminds me a little bit of this, um, of this Gemma to thing with the very fragile demos. The edge cases can sometimes take a long time to iron out.
There's a long tail of stuff anytime, especially when you launch in a new modality, right? Because then you've got to create new evals, you've got to create new tests, new red teaming protocols, that aren't necessarily constrained to the kinds of evals you might run on a text based system, which is what OpenAI had been optimizing so hard for before. Um, so yeah, anyway, I think this will have been a new challenge for them.
And presumably, like my guess is it'll take less time for subsequent rollouts of Sora, um, for, for that reason, because they'll already have that base of expertise built out.
Next up a story from Microsoft. It seems everyone is competing to get a story like this recently. And they also have a story regarding sort of agent capability, uh, Microsoft's co pilot. We'll be able to browse the web with you using AI vision. So they are trying to add this to their edge browser. This is in testing. And so this feature copilot vision, users can enable it to ask questions about the text, images, and content they are viewing, uh, to help you out. Seems not quite.
As agentic, I'm pretty sure it's not going to be able to sort of take a request and go to websites on your behalf to do things. Uh, this is also currently in limited testing available only to Copilot Pro subscribers through the Copilot Labs program. But, uh, another example, yeah, Google is doing agentic stuff for their browser and Microsoft is definitely going to go in that direction, and this is an early preview of that. Yeah, and, and, you
know, this is another instance of the case of, um, OpenAI or Microsoft starting to kind of not distance itself from OpenAI, but certainly assert its independence in a more full throated way. Like, we have products that directly compete with OpenAI's products. They want that because, is. Um, well, from everything I've heard, I mean, the, uh, the Sam Altman board debacle really, uh, really rocked that relationship quite, um, quite badly.
And, uh, at this point, you know, that's part of the context that Microsoft is keeping in mind as they ensure that they're not, you know, there's an antitrust piece too, but, but they're also really keen on ensuring that, uh, they have their own internal capability. And, you know, that's, that's going to be part of what this is. Again, uh, you know, distribution is king and Microsoft certainly has that through Copilot. So it'll be interesting to see the uptake on this one.
A couple more stories, starting now with X, previously Twitter, and they have launched a Grok image generation model. So, this is now initially available to select users. They say it will be rolled out globally within a week, and it's an image generation model. It can generate high quality images from text or other images. It was codenamed Aurora Aurora. And, uh, we don't know too much about it, of course, uh, but, uh, interesting that they are training to me, uh, you know, I guess.
OpenAI has DALI and, uh, presumably other companies have their image generation models. Uh, Grok initially allowed you to generate with Black Forest Labs or Flux. Now they have this model that is presumably in house.
Yeah, this is also kind of interesting. Yeah, the black forest labs piece. Um, I, I have no idea what they're thinking now because a lot of their, I think we talked about their big fundraise, I believe last week. Um, that kind of, you know, I think they're pushing a billion dollars now.
So, you know, those sorts of valuations are dependent on presumably an ongoing relationship with X. And to the extent that grok takes that over with their own native image generation stuff, I mean, that's a structural problem for black forest labs. Like, I don't know how they recover from that. Um, I don't, it's, it's like as things, you know, take off and especially you'll start to see interactions right between, uh, you know, grok three grok four and the image generation functionality.
Uh, there's all kinds of reasons why that's the case from, um, You know, annotations of images like highly, um, highly descriptive captions, things like that. Just the world model picture. Um, eventually, you know, multi modality is best done in one ecosystem at scale. And so, uh, you know, you're less likely to want to farm out individual use cases, individual modes like vision to, uh, to, to, to, to sort of partners. So curious about what that implies for that relationship for sure.
The blog post itself, not a whole ton of information, right? Like, despite the, um, X being oriented towards this open source approach, this is definitely more of a closed source, Sort of announcement. We don't have code, we don't have architecture information. Um, we're, uh, we're just sort of waiting and apparently, yeah, really good photorealistic rendering, right. And precisely following text instructions.
Well, that's pretty consistent with what we've seen from, from other products so far, but interesting that it'll be native to X for sure.
And, uh, again, we don't know too much about it. It could be built on, uh, Flux, but, uh, they do claim that this is, uh, trained seemingly on their own. So yeah, impressive. Another impressive thing to have shared from XAI given that they're sort of in catch up mode. Next up, Cognition Labs, a startup. Uh, they kind of made a splash with a demo of Devin, a software engineer, as they called it. And it has now launched, uh, quite a while, many, many years ago.
a few months at least since they initially previewed it. And you can use it if you're a subscriber. So you have to pay 500 per month for individuals and engineering teams. There is an Integrated development environment extension and API, also an onboarding session and various things like that. Uh, so yeah, another piece of the agentic story here where we've had AI, uh, code writing assistants. for a long time. I think they have been integrated pretty deeply into a lot of programmers workflows.
I know for me, that's definitely the case. And now there is a race to make software engineering agents that can do even more on the software front.
Yeah, it's, uh, it's kind of interesting. I mean, Devon, um, was released a long time ago, right? It's like eight months ago, back in March, and at the time, there were all these impressive demos that were claims and counterclaims about, you know, whether this hype or not, um, it kind of seemed like it might have been a fairly, um, frail model in the sense that, you know, it could do the, again, this idea, it could do the demos, but could it actually perform for practical tasks?
Uh, the claim now is that this version, at least of Devon, Is really good when users give it tasks. They know how to do themselves. Um, and also teaching the model to test its work, keeping sessions under three hours, breaking down large tasks. Sure. Basically, it's all the stuff that you typically if you use. These sorts of tools, whether it's copilot or something else in your, your development, like this is all standard stuff. Um, so it does seem like one of those things.
I'm curious to hear the side by side, if you're going to defend that 500 buck a month price point, like that, that's asking a lot. So, um, you know, you're, you're up against, uh, you know, opening a one, you know, 200 bucks a month for the highest paid tier or the most expensive tier of a one. So is this really going to be two and a half times better? For this use case, I think that's a really interesting question. And, uh, we'll find out soon.
Cause I think, you know, when it comes to Devin and, you know, cognition labs in general, they, I think they've got an uphill battle. I'm going to say the same thing that I say about, you know, them about cohere, about adept AI, all these mesoscopic companies that, that haven't raised a ton, um, the reality is scaling is still working. I know that there's, there's, uh, you know, a whole bunch of, um.
A nice thing about scaling these days, but, um, when you actually look at what's going on, it is still working. Uh, this is why companies are pumping in billions into new data center builds, like tens of billions. Um, so I think companies like cognition are actually going to be in deep trouble if the scaling trend trends do continue. And so my naive expectation is that they end up folding sometime in the next 2 to 3 years. And we'll see. Hopefully they proved me wrong.
And then this is kind of part of the problem in the space, right? Like the rich get richer in that sense. You have the big players that can afford the big data centers that build the better models. But, um, I think this is a, It's a pretty interesting moment. I almost want to say do or die because they either, they either rock open a eyes Oh one and similar models like claw 3. 5 Sonnet new, which is probably the most direct competition here, uh, or, or they don't.
And you're, again, you're defending 500 bucks a month. That is a steep, steep price point.
And, uh, they, I don't know actually if they claim to be training their own models. I think this is an example also where the user experience piece is increasingly important if you're competing in a space. So they have, you know, the ability to use it in a browser, they have integrations for your, uh, IDE. Uh, you can use it via shell and, uh, a lot of the time it's also, uh, Part of if you kind of adopt a tool and you get to know the tool, uh, you might just stick with it right?
And you don't need sort of to train the model necessarily. You can use llama. You can use another API. You just need people to stick with you. And that's currently kind of a war going on with cursor and a lot of these startups doing, uh, Uh, tools for just software engineering were built in stuff. So, um, yeah, that's a,
that's a good point. Sorry. You're right. My, yeah, my mind was absolutely going to, to that idea as you write as a platform slash integrator, I guess that's, that's good without information about, you know, specifically how it works under the hood. It's just, you know, you, you risk all the standard things where you're going up against the, Big players with distribution, um, and, and getting swallowed whole as well by the, the UX and UI of, of OpenAI of Claude, but you're absolutely right.
Yeah, that's a distinct, uh, set of risks.
And, uh, yeah, to your point of comparison, it's gonna, uh, just looking at their blog right now, they did post a review of OpenAI01. Uh, and then talking about coding agents. So, uh, yeah, there's a bit of worry there. I think we've, we've all this work on the agents from a whole bunch of people. All right. Even more stories on news. We just got a couple more. The next one is a part of the ship mess trend at OpenAI, but also some more.
This one is about Apple and they have launched, uh, iOS 18. 2. Part of that was the new chat GPT integration with Siri. So at long last, you can do that. Uh, users do not need an open AI account to use the integration, but you can opt for upgraded chat GPT versions. through Apple, and, uh, there are apparently also privacy protections where OpenAI does not store requests.
In addition to that integration, you also get some things like Genmoji, better, uh, text tools, and, uh, you know, a variety of features as we've seen with App Intelligence. So, yeah, uh, nice to see this coming here. It took a bit longer, maybe, than people expected. At least I expected, but certainly, uh, important for Siri to stay viable.
Yeah, man. And with that, uh, open AI is now partnering interestingly with both Microsoft and Apple at very large scale, which is highly unusual, right? Like these are normally like the Apple, Microsoft. Rivalry is one of the longest standing ones in modern Silicon Valley history. So, you know, this is quite a quite a feat by San a being able to Machiavelli his way into the close relationship with both these companies.
Um, the other thing, too, is Apple is known to be building their own internal language models. Their hope is really to do a lot more of this internally. And, um, so it'll be interesting, you know, from a data standpoint, I don't remember the details of the data flow In this exchange, right? Like what data stays on, on Apple? So I mean, I've heard the claim that the user data stays on, on Apple hardware and doesn't touch open AI hardware.
I forget how that's implemented exactly, but that'll be a central concern here. And a brand reason for Apple to take this in house, have their own LLMs as much as possible serving up chatbots. But yep, for, for now, it seems like a And I don't know what to call this, like, uh, an alliance of convenience for the moment for these two, uh, two big players.
And the way this works, if you are a user of Siri, this should just kick in if you ask Siri a complicated question that it cannot handle. Uh, it will then ask you whether, you will ask the user for permission to access ChatGPT to answer a question. So Perhaps you'll start seeing it if you give Siri tricky questions. And on to the last story, this one is about Reddit. They have a new AI search tool. So this is called Reddit Answers. And it is what it sounds like. You can ask this tool a question.
It will look. Through Reddit, presumably, and provide you an answer, which would mean that instead of Googling for what people are saying on Reddit, maybe you will actually go to Reddit and ask this thing, what people on Reddit are saying with respect to various things. Still initially available to a limited number of users in the US and in English, but it will presumably soon expand to more languages and to Android and other things like that. Yeah, this is
actually part of a really interesting battle, you know, subplot, let's say, is playing out in the search space. So Reddit, and I don't know if you found this over the last sort of two years, I found myself increasingly Googling to find subreddits. Like, basically, um, the, the real answers I want are on some, you know, machine learning subreddit, right, or, or some, uh, I don't know, some, some, what's other stuff that I do?
I don't do a lot of stuff, but basically, things like that, and, um, and, and what, so you're functionally using Google just as a, a way of getting into Reddit, which is a sign that, you know, Google's in a bit of trouble, right?
If, if, If that's what they're, they're leaning on, if, if you're finding yourself more and more drawn towards a certain platform, admittedly for some use cases, um, but it makes it very tempting for Reddit to just say, Hey, you know what, we'll just make it, um, uh, a lot easier to kind of use the AI augmented tool set, you know, have summarizers, have search products and things like that natively at the same time, Google is playing with, Right.
This whole idea of summarizing websites rather than just serving up websites, which is a threat to websites like Reddit, because, hey, maybe you don't have to click through. Maybe you don't have to actually, you know, give them your eyeballs. You can just give those eyeballs to Google entirely. And so this is all kind of part of the landscape shifting underneath mostly Google's feet. And I think this is a structural risk for Google in the long term. Certainly, um, search is going to change.
We just don't know exactly how we don't know what the form factor of the final product will be. Um, but, uh, another instance of that trend here. Yeah.
And, uh, they gave one example, at least, uh, in this article, tips for flying with a baby for the first time, which is the kind of thing you might. ask for on Reddit, and it gives you a well formatted response with built in links to original discussion. So in a way, quite similar to the overall trend with AI search, where, you know, it looks up a bunch of, uh, articles, or in this case, uh, Reddit conversations.
It just summarizes for you in a new AI generated, uh, kind of answer that combines all that information and provides you the links to go back to the original source of that. So, uh, yeah, I totally agree. I think often people do use Google to find discussions of stuff they are thinking about on Reddit and perhaps this will start to change that. We'll see. All right, that's it for tools and applications. Quite a lot this last week. Moving on to applications and business.
And the first story is once again about OpenAI. The summary is OpenAI is aiming to eliminate the Microsoft AGI rule to boost future investment. So this is like reportedly, according to people inside, not anything official here. There is a rule that prevents Microsoft from accessing future AGI technology. Uh, so that was put in place long ago.
Uh, that would kind of mean that, um, I think essentially OpenAI would have control once you get to what they deem AGI, uh, commercial partners won't necessarily be able to get access to that. Uh, this was back from the days when it was a nonprofit. Well, now it's trying to go for profit and, uh, various things are changing. Potentially this would be. One of those things.
Yeah, this is this is kind of interesting, right? Just because of the way open AI originally framed this carve out, right? When there was this first big Microsoft investment, the 10 billion. Actually, sorry, that was before that is around like 1 billion. They put in, um, you know, the claim was made. Well, look, um, now you're you have all these lofty goals about ensuring the benefits of AI are for everyone. Share it with everyone and you build it safely.
You're not gonna, you're gonna actually invest in in things like super alignment. Um, and now you're, you're partnering with Microsoft in a way that gives them access to your I. P. So like, you know what? What value are your guarantees of how you're gonna treat the technology if you're attached to the hip? To somebody who will just, who is not bound by those, those constraints and, um, and the response there was, okay, well, don't worry.
We have this clause in our agreement, as you said, that prevents Microsoft from accessing AGI. They can access anything else, right? That's part of the agreement. But once we reach AGI, which they define internally as quote, highly autonomous systems that outperform humans at most economically valuable work. Then Microsoft won't be able to access that tech. Now you may be asking yourself, highly autonomous system that outperforms humans at most economically valuable work. That sounds very fuzzy.
Surely somebody has to determine what that means and move across that threshold. And the answer is yes, the opening eye board, the board of the nonprofit was to determine when that threshold was achieved. And therefore, when Microsoft's access to opening eyes, technology would get Off right now.
The problem is if you're asking Microsoft and other big players to come in and invest giant wads of cash to fuel your continued scaling, you really have no choice but to say, okay, open kimono, you're gonna be able to use all this tech. That's a big issue, right? That's a big issue for opening. I on their website right now. It says quote, AGI is explicitly carved out of all commercial and IP licensing agreements.
Um, this was explicitly done to prevent, uh, you know, the kind of people who were less security, safety conscious, uh, you know, whatever opening I would say currently that it is. Um, from accessing the technology, and now they're rolling that back, right?
So this actually, I think, would rightfully be viewed as a lot of like early opening eye cheerleaders as a direct kind of contravention of their earlier principles, um, sacrificing a principle for The ability to continue to scale, which is a requirement. Like, look, we're at a scaling race. OpenAI has no choice. They need to be able to bring in fresh capital because the CapEx requirements of scaling are so insane.
Um, but here's Sam Altman explaining that at a New York Times conference, uh, just this past Wednesday, he said, quote, when we started, you have to imagine I'm speaking with verbal fry here. When we started. We had no idea we were going to be a product company or that the capital we needed would turn out to be so huge. If we knew those things, we would have picked a different structure. And that's all very interesting because it ties back.
I've been hearing tons of stuff from, from friends at Opening Eyes, some folks even who've who worked, um, anyway, in, in, in, let's say Sam's orbit, that his view is that like, Oh, the problem was that the corporate structure, um, it was all wrong to begin with. And the fundamental challenge he will face if he tries to make that argument to himself and to others is that the principles themselves, the principle that underpinned opening eyes activity.
Um, these lofty ideals of, you know, safety and security and all this stuff. Um, those are by opening eyes own arguments back then betrayed by this action. That's what at least it seems like to, I think a lot of people. I think there's a pretty strong argument there. There's an argument out of necessity though, that trumping everything, which is just like, yeah, well, if we want to have any role in this Brave new world. We have to be able to scale. That means we have to be able to go for profit.
We have to be able to ditch clauses like this. But, um, I think this is really, really tricky, especially, you know, you think about the whole transition from nonprofit to for profit, you know, opening eye right now. Um, and, you know, Sam Altman in particular is word starting to look pretty unreliable. Like it's It's honestly difficult to think of things that OpenAI committed to back in the day that they still are, like, really sticking to.
We've seen failures on the, like, complete catastrophic failure to fund super alignment and, and people literally, like, three rounds of successive super alignment leadership, just ditching the company. And
their, the promise was, uh, 20 percent of resources will go to safety, right? Which reportedly hasn't been the case. And that was part of a frustration, presumably internally.
Exactly. Ambiguity there too about was it 20 percent of, of like, uh, of the compute stockpile they'd acquired up to that point, 20 percent going forward, all these things that you could, you could argue that that ambiguity, by the way, was a feature and not a bug, uh, and that it made it easier to kind of, uh, claim that they weren't here, but even by any reasonable standard, it seems that they just flopped on that.
Um, and in many such cases, like many, many such cases, Uh, which of which this seems to be just yet another example. So I don't know what the interaction will be. I'm not a lawyer. I don't know what the interaction here will be with the, uh, the nonprofit to for profit, uh, changeover, but man, are the, uh, is the list getting long now with open AI
and, uh, yeah, I wouldn't be surprised. If this move is something Microsoft really wants before they are willing to bump more money in, as you said, the question of what is AGI and what isn't is pretty nebulous. And so if you're Microsoft, you'd be like, well, I don't know. You can call something AGI, even if we disagree, right? Some people might argue, uh, oh, AGI. Someone, I think we've been opening.
I posted basically saying that, which, uh, Yeah, that's not good if you could just say like, Oh, this is a thing you want to get access to, uh, because we think it's AGI. So, uh, certainly from a business perspective, it makes a lot of sense. And on to the second story, something we haven't touched on in a little while, but personally I think it's a bit of a big deal.
The story is that GM has halted funding of robo taxi development by crews, uh, ending kind of a long ongoing, uh, tragedy, you could say, but we've been covering for a while now. A
slow motion car crash, one might say. Sorry, I'll see myself.
Uh, yes. Uh, well, so what happened? Just quick recap. Cruz had a major incident over a year ago now, I believe, where they were partially at fault, let's say, for an injury somewhat sustained, there was a car crash due to a human driver, but then the cruise car pulled over in a way that hurt someone. And the big problem was the cruise communications with regulators was, let's say, dodgy. They didn't fully disclose everything. They weren't fully cooperative. That led to a bunch of issues.
Cruz was at that point testing on San Francisco streets, just like Waymo was. That ended. We've seen kind of a slow movement towards getting back into the game from Cruz. But it's always been a question of whether they will try to compete with Waymo and increasingly Tesla. And now they are pretty much bowing out. It's pretty clear that GM is planning to acquire the remaining crew's shares and will then kind of fold it in, presumably to use that technology in their cars.
So, uh, yeah, now it's pretty much two big players. It seems that basically Waymo, And Tesla are the two providers potentially of self driving robotaxis. Way more increasingly rolling out throughout the U. S., but somewhat slowly. Tesla increasingly improving their FSD software. Recently they launched FSD 13, which is at this point looking pretty impressive.
Much less sort of scary to let it take over and drive you around much more human like they say because of the end to end training from data going just for video. So one of these things that right now might go on the radar, but in a year I do foresee a lot of robot taxis being everywhere and it's a Waymo or Tesla or both will dominate.
Yeah, the, the cruise GM relationship has been an interesting and rocky one, uh, for, for some time, um, Kyle vote, who was the founder of the company, um, and, uh, and took him through famously why combinator actually he left last November and, and after, uh, he left, he, he put a, he put up a tweet saying in case it was unclear before it is clear now GM or a bunch of dummies. So, uh, you know, it's certainly. Certainly a rocky history there.
There is also Honda, which is an outside investor in cruise. Um, they put in about 800 million or 850 million into cruise up till now. Um, and, and they're basically, they had planned to launch a driverless ride hail service in Japan in 2026, but they're saying they'll now be reassessing those plans.
Um, and, uh, anyway, kind of interesting, uh, step back from, from both these players at the same time, as you say, two, two big players left, uh, kind of interesting in, in the, uh, driverless space.
Moving on to the lightning round, and we have a bunch of stories on hardware. First up, largest AI data center in the world to be built in Northwest Alberta. So the name of this one is a one. Wonder Valley is what the largest AI data center will be called, and it will cost, uh, an estimated amount of 70 billion dollars, uh, funded by a collaboration between the Municipal District of Greenview and O'Leary Ventures, which is led by Canadian millionaire Kevin O'Leary.
So, Total surprise to me, uh, Jeremy, I'm, I'm guessing you have more to add on this.
Yeah, so this is, uh, this is insane. This is a huge story. Um, I think this is from an infrastructure standpoint, one of the biggest stories maybe of the quarter, um, if not the biggest story. Uh, so right now, just to situate you a little bit, people are struggling to find a spare gigawatt of power, right? So, um, for context, uh, so one, H100 GPU very roughly will set you back about a kilowatt, right? It consumes a kilowatt of power.
So if you want 1000, um, uh, H100 GPUs in your data center, you're going to need a megawatt, right? Um, if you want a million, H 100 GPUs, you're gonna want a gigawatts. When we talk about a gigawatt of power, we're talking about roughly speaking, order of magnitude, about, uh, 1 million, uh, Nvidia, H one hundreds or, or the equivalent. And, um, and what we're now seeing is companies like meta looking for that. You know, that big two gigawatt cluster, right?
The one gigawatt cluster, 1.5, stuff like that. There really is no plan right now to hit the 10 gigawatt cluster. In other words, the 10 million H 100 equivalent cluster that is going to require massive infrastructure build out. Um, this is really noteworthy because like out of nowhere, Canada is relevant now for some reason, right?
People have been looking all over North America for sites where you can build a large structure like this that can accommodate, you know, really pushing the, the 10 million GPU threshold. And, um, and that's really what this means. Now, this project is going to unroll in phases, right? It's not going to all happen at once. Phase one is going to involve the first 1. 4 gigawatts of power that's going to be brought online.
Um, they plan to bring on an additional one gigawatt of power each year afterwards. And again, very roughly, so one kilowatt is roughly one GP, which is roughly one home. So, you're thinking here about like the equivalent of, of um, powering an additional 1 million homes per year in this one little area. So that's pretty remarkable infrastructure build out. Um, and, uh, the phase one build out again for that, that, uh, first 1. 4 gigawatts is estimated to cost about 2. 8 billion.
So the vast majority is coming in later as they look to expand out. Um, this is a big increase in Canada's just like baseline power generation capacity. It's about 150 gigawatts, uh, that Canada generates on an annual basis. So here we're looking to increase that by, well, what is that, by about 5 percent if my math is right? Yeah, about 5%. That's a 5 percent increase just from this one site, um, in, in available power. You need that, right?
You need that to be able to power the cooling, the GPUs, the infrastructure, all that good stuff. Um, but it, it makes this location like a really interesting geostrategic location. Like all of a sudden it, it makes it relevant. Now the timeline is tricky, right? So what we're hearing here is. we're hitting 7. 5 gigawatts. Yes, that's the goal. Um, the, the idea though, is to have that online over the next five to 10 years.
And so that's part of the challenge when you think about, especially if you're AGI timelines are like, you know, 2027, something like that. Um, then you may think of this as too little, too late, or at least at the full 7. 5, but still the 1. 4 gigawatts that will be coming online sooner may be relevant. So overall, really interesting. Why is this happening in Wonder Valley, Alberta, like in the middle of, of nowhere? Well, the answer is number one, oil sands.
So Alberta is a Canadian province that is known for having a lot of natural gas. Thanks to the Alberta oil sands. Also not mentioning the article, but potentially quite helpful. It is cold as fuck up there. potentially, it is very significantly kind of made easier by that. And then there's all kinds of pipeline infrastructure that's been developed there. Obviously, Alberta is not obviously Alberta is the Texas of Canada.
So they produce all the oil people there even do they have like stampedes and shit. It's basically just Texas and Canada. And, and as a result, there are all kinds of pipelines that allow you to move Um, resources around very easily. And so, uh, and there's also a fiber optic network that's set up. So a lot of reasons why this is a really kind of promising site and Kevin O'Leary of Shark Tank fame, right?
If, uh, you know, you've, you've seen that, uh, you know, he is Canadian, but he is also a bit of a, a bit of a figure in, um, in the, uh, world of, of American kind of testified in Congress about crypto and he's done all kinds of stuff like that. So I think we'll be seeing more from this project. This is really, really interesting.
Um, one hopes that they will be engaging the appropriate national security assets to secure a site like this, because although it may not seem so today, if you buy into the premise that AI systems will be more and more weaponizable, this is going to be a national security asset, maybe first and foremost.
And on a very related, somewhat similar story, Meta has announced a four million square foot, uh, data center in Louisiana. which will cost about 10 billion, use two gigawatts of power and, uh, will be used to train the Lama AI models. They say they have pledged to match the electricity use with a hundred percent clean and renewable energy and will be working with a company called Entergy to bring at least 1. 5 gigawatts of new renewable energy to the grid.
So yeah, very much a similar story and the one we've increasingly seen from all of these massive companies. Yeah, and two
gigawatts again. I mean, this is a very significant amount of power, but a pending regulatory approval, right? That, that important phrase. Um, what I will say is, so right now the new generators expect to come online somewhere between 2020, 2029 pending regulatory approval. Uh, that probably is going to be a lot faster, uh, given the Trump administration's agenda on deep massive deregulation of American energy infrastructure, which at least in my opinion is a very important thing to do.
Um, I think even the Biden administration has a task force that they've set up to look into like, how can we do some of this stuff? Um, so expect the, you know, the timeline associated at least with the regulatory hurdles to get cut back. Fairly significantly. Um, it's something that, uh, that I've actually been working on quite a bit as well. It's just like, how do you do this?
How do you, how do you deregulate the energy piece to make sure that you can unlock, uh, American production and important if the AI side in a, in a way that's secure? Um, yeah, apparently there are nine buildings, uh, and there's going to be work that starts actually this month, uh, in December with, uh, construction continuing through 2030. One of the interesting things about these sites is they're basically never finished.
And once they're finished, they're They have a pretty short shelf life before they're no longer relevant because the next generation of hardware comes out. Uh, so, uh, so yeah, and it's sort of like this living building, I guess. Uh, the overall, uh, development is known as Project sugar, sugar, actually sick is the French word for sugar. No idea why that, well, I guess French because Louisiana, but there you have it.
So, um, they go into the, anyway, the details, uh, there's like 2, 200, um, megawatts worth of. Power coming from combined cycle combustion, combustion turbines. Um, and there, you have two substations, which have crazy long backlogs, by the way, like anyway, there's all kinds of stuff, uh, that they got to put together to, uh, to get this stuff into shape, but it's going to be a big deal. We'll be used to train long models of the future and, um, met us on the map.
Yeah, fun fact, the 27th data center of Meta and they also say this will be their largest to date, so setting some records at Meta. And one more story on this front, we have one from Google and it is said that their future data centers will be built next to solar and wind farms. This is in relation to them partnering with Intersect Power and TBG.
Wise climate and they say that this will be kind of a source of being able to build data centers powered by on the site, renewable energy that according to them is a first of its kind of partnership. It's a 20 billion initiative. So curious to see how significant you think this is, Jeremy.
Yeah, I mean, the sourcing of power is kind of interesting, like it's something that companies can be very showy with. Meta's done this a lot where they'll like, you know, build out some, some like solar or wind thing. One of the big challenges with solar and wind is that the demands, especially when it comes to training models are just like you need high, um, high throughput constantly of your power, right? High baseload power.
Um, and, and unfortunately, you know, the wind isn't always blowing the sun isn't always shining. So when you look at renewables, this is kind of a serious, a serious issue. So in practice, a lot of these data centers, while they were sometimes built next, excuse me, next to or, or, um, Um, concurrently with a bunch of renewables, which these companies will, will do for, for the headline value.
Um, in practice, they, they draw down typically like natural gas or some sort of like, you know, whatever spare nuclear power there is on the grid, stuff like that. So, um, this is, uh, this is sort of an instance of that trend. Um, and, uh, you know, it'll be interesting to see if they can find ways to kind of, you know, Fix the solve for the variability and in power generation there, but one of the things that this is also an instance of is the trend of companies going behind the meter.
So basically, you'd be in front of the meter with your build, in which case you're drawing power from utilities, or you can go behind the meter, in which case you basically like have an agreement with a power provider, like a Our plant and draw your power directly from them. Uh, that's really what's, what's going on here. So intersect power in this case would, um, own, develop and operate the co located plant. Uh, and, um, anyway, so, so this is, uh, the, the agreement they have there.
They also have an 800 million. funding, uh, intersect power does from Google, um, and, uh, so that the kind of interconnection between power generation companies and big tech companies is really starting to become a thing right now. Like, it is the case that AI is eating everything. And um, uh, it's sort of interesting to see that you've got to become a power company. You've got to become a hardware design firm, like, like all these things that it takes to scale AI, uh, to, to massive scale.
Yeah, it's it is quite interesting because obviously data centers has been a thing for a couple of decades. Google, Meta have massive data centers. They've dealt with sort of similar needs. Presumably you do need a lot of power for data centers in general, but now with these AI data centers, it's It's much, much harder and, uh, yeah, I'm sure it was going to be an interesting book to be written to just on that topic alone.
Uh, but it's going to be a lot of books that need to be written about AI and what's going on right now. I guess. On to projects and open source. We had a bunch of stories in the previous episode, only one this time, and it is about Google. Again, they have released Pali Gemma to these new Pali Gemma models. Our vision language models, they have a 3 billion, 10 billion and 28 billion, uh, variants with resolutions that vary as well.
They have nine pre trained models with different size and resolution combinations as well. So yeah, we've seen these demo models come out in, uh, from, uh, Google pretty. Pretty regularly now. And, uh, this is seemingly getting some good dimensional performance for things like text detection, optical music, score recognition, and radiography report generation. So not, you know, kind of a big deal. VLMs are.
A little less prominent on the open source front, and this is a pretty significant VLM then.
Yeah, and then sort of two take ons as well from the paper itself that, that, um, struck me as especially interesting. You know, one of the findings was the larger the model that you have, the lower, um, the optimal transfer learning rates were during, during training. So, while they're training these models, on a bunch of different tasks, they discovered this pattern about the learning rate. So the learning rate, by the way, is, um, how much as you change the model, right?
By how much with every batch of data, do you update your model weights, your model parameter values, right? Like if you take a big learning rate, That's a, that's like making big changes. A big step size in, in, um, parameter space, right? Or a smaller learning rate is a small, smaller step size. Um, a common way that this sort of takes shape is you'll tend to want like larger learning rates at the beginning of your training process because your weights initially are just complete garbage.
They're randomly initialized. And then over time as your model gets better, you want to reduce the learning rate as you kind of get more and more refined, make smaller and smaller tweaks to your model as it learns. That's a bit of an intuition pump for this. Um, so in this case, they were interested in kind of crossing over between different kinds of problems.
And what they found was, um, the larger the model was, the, uh, smaller you want your, um, uh, uh, sorry, the, yeah, this is the smaller you want your learning rate to be. And this is kind of interesting. I mean, maybe some intuition for this is, you know, if you have, um, A large number of degrees of freedom, um, you can, you know, you kind of tweak them, maybe all a little, like it just allows you to make more nuanced movements.
Whereas you need to, you know, make more significant movements when you have fewer degrees of freedom to, to, um, to learn the same thing. thing, let's say. Uh, kind of an interesting result. The other one was apparently increasing the resolution of images has a similar computational cost to increasing model size, which I found a little bit confusing at first. It actually does make sense.
Ultimately, you know, when you increase, so when you increase model size, the reason that it costs you more compute obviously is you have more parameters to dial in. Right. So you just have more moving parts. You need to refine every time you pass a batch of training data through, um, and, and, and more to compute, um, the, uh, the kind of forward pass as well.
But the, the, so the issue here is if you get a bigger image that will be associated with, uh, an encoding that has like just more moving parts as well, like more, especially like having more tokens, uh, that your model has to go over. And so sure, you may not be processing.
Uh, you know, using a larger model to process those inputs, but a larger image still involves more computations because it's essentially like having more, well, it's more data and it sounds, sounds pretty intuitive, at least when you, when you put it that way. So, uh, you've got two different ways to increase the compute spend on your problem set. You can keep resolution fixed, but increase the model size, right?
So you could go from your 3 billion parameter model, 10 billion parameter model, um, or you could keep the model size fixed. but increased resolution. And depending on the regime you're in, it can actually be more efficient to do one or the other. They find that one is more or less compute optimal. So I thought that was kind of, um, kind of interesting. And, uh, again, you know, more, more for the scaling literature that I'm sure we'll come back to later.
Yeah, this is the one I found quite interesting. They found basically three groups of tasks. One group where The two were about similar in terms of the improvement they gave. This is actually the majority of the tasks, things like segmentation, for instance, uh, they found that making the model bigger or increasing resolution both were pretty effective, but there were examples like text VQA, where the uh, resolution really helped more or doc VQA, for instance, which makes some sense, right?
If you need to read text, probably higher resolution helps quite a bit. And then, uh, they do have other examples like science QA, for instance, where maybe because the model is bigger, it can better answer scientific questions and has more information in it. Uh, so certainly something I haven't seen before and an interesting result from this paper. And speaking of papers, moving on to the research and advancement section.
And we begin with a pretty cool paper, training large language models to reason in a continuous latent space. So in the reasoning paradigm in things like O1, what you've been seeing in general is, uh, often the way to reason is you literally tell the model sort of. Think through a set of steps that you'd need to do to solve this problem. Then execute each of these steps.
Uh, in some cases, you like review your answer and then iterate on your answer, see if there's anything wrong with it, et cetera. And all of that done is via outputting text, right, and feeding that text back to the model. So what this paper proposes, uh, they have this new reasoning paradigm they call Coconut, that takes the hidden state, the like non text, the very kind of soup of just numbers that somehow encodes meaning in a large language models.
And then feeds that into the model as the reasoning step instead of, uh, converting that, uh, hidden state into words previous to that. And so they call this continuous thought because these numbers are sort of a continuous representation. Of what will become a text, which is discrete. There's like a set of letters that you can choose from. So this approach has a lot of benefits. You can explore multiple reasoning paths.
You don't need to decode, which is one of the very costly operations of LLMs is going from representation to text requires decoding. Uh, so this certainly kind of. augments your ability to do things like chain of thought reasoning. And they do show in experiments that this outperforms chain of thought in logical reasoning tasks, uh, with fewer tokens during inference.
Yeah, this is, uh, for, for my money, this is really the paper of the week. by far, right? In fact, the story of the week by far there, the implications here are really, really wide ranging. And I think, um, this is going to be rolled into, if it's not already, frankly, it's going to be rolled into a training schemes that we see for agentic systems very soon.
So. Um, basic frame here is, you know, your, your text based reasoning that you see, as you said, with chain of thought, right, where the model will explicitly write out its own chain of thought, and it will use its own chain of thought to help guide it towards more optimal solutions. That approach is not ideal. It's not the best way for these models to reason. Um, most tokens are, are used for things like textual coherence, grammar, things like that. Not essential for reasoning.
Um, by contrast you have some tokens that are, that really require a lot of thought and in complex planning, right? Think about, for example, a sentence like, the best next move on this chessboard is blank, right? That blank, like you'd want your model to really think about. think hard about what that next piece is, that next token is.
Um, but with current approaches, you're basically spending the same amount of compute on that word as you are on the word the right, which is not terribly informative. So that's, that's kind of a, an interesting intuition pump for why you might want another approach and approach that doesn't involve explicitly laying things out in, in plain English.
Um, so what they are doing is yeah, prior to, so if you imagine your transformer, you know, you feed in your prompt, your input, uh, your input tokens. And those prompts get turned into, well, they get turned into embeddings, basically just a list of numbers that represents that initial prompt.
And then that list of numbers gets chewed on, gets multiplied essentially by matrices all the way down until you get a final sort of vector, a final list of numbers that's been chewed on a whole hell of a lot. And that final list of numbers, the last hidden state is what gets embedded. Uh, decoded into an output token normally, a word, an actual word that you can interpret and understand.
But what they're going to do here is they'll take that last hidden state, instead of decoding it, they're going to turn around and feed it back in to the model, the very bottom, uh, in, in the position of the input embedding and have it go through yet again. And what they're doing is essentially causing the model to chew on that token again. That's one way of thinking about it, essentially. Um, but the way they're going to train the model is by using a chain of thought dataset.
So what they do is they start by saying, okay, you know, imagine you have a chain of thought that has like, okay, I'll see, I'll start by solving this problem. Um, step one, I'm going to do this. Step two, I'm going to do this. Step three, I'm going to do this and so on.
And what they're going to do in the training process is they'll use that Um, expensive to collect data set that that chain of thought data set and they'll start by, uh, kind of causing, um, so sorry, let me take a step back when, when this model, uh, generates its last hidden state, right? When you finish propagating your data through, you have your last hidden state. Normally you would decode to an output token. Um, but now you're again, feeding it back into the bottom of the model.
Well, you still need a token that gets spat out just for your model to be coherent for, for the, essentially for the, anyway, to respect the fact that it's an autoregressive model. And so what they're going to do is they're going to essentially put out like a thought token in that position as, as the output. Now you can essentially decide how many thought tokens you want between your input and your answer.
And that allows you to control, like, how much thought, how much inference time compute in a very interesting and fairly objective way your model is investing in generating that answer. So, number one, that's a really interesting way of quantifying semi objectively the amount of compute that goes into your inference time strategy, right? Um, so we haven't seen stuff like that before. I think it's exciting for that reason in part.
But the other thing that makes it interesting, Is the training process so that thought token that you're spitting out what they do is they'll take their data set their chain of thought data set, and they'll kind of like blank out one step at first, like say step one, and they'll try to get the model to just. Uh, replace it with a thought token, so not actually spit out the, the step one reasoning. Um, and then, and then they'll, they'll, you know, but they'll keep step two and step three.
They'll allow to reason in plain English for those steps. And then in, in a later round of training, they'll replace step two and then step three. So sort of this iterative process where you're getting the model to reason in latent space for more and more of the problem set. And that allows you to kind of control, um, the, the, anyway, get the model to converge in a more robust way.
Um, last thing I'm going to say, like there's so much, if you're going to read a paper cover to cover, uh, this quarter, make it this paper. Like this is really a really, really important paper. Um, one of the key things that they find is when the model in traditional chain of thought, when the model is about to generate a token, like one piece of text, um, the, the, um, thing that gets decoded, the last hidden state that actually gets decoded into that token.
Um, actually contains, it encodes a probability distribution over tokens. When you force it to actually decode to give you one token, you're kind of, you're kind of telling it like, look, I know that you think the solution could start with like any one of a dozen different possible tokens. I'm going to force you to just pick the one you think is most likely. Now, in that process, what you're really doing is you're destroying all of the potentialities that the model was considering exploring.
It was essentially in the state where, you know, as you might be, if you're thinking about solving a problem, you might be like, well, you know, my approach might involve this strategy or this strategy. I'm not really sure which one to try first. But then it's basically forcing you to go, okay, I'm going to commit to this one. And once it commits, once it actually decodes that token in a conventional chain of thought strategy, then it cuts off all those other possibilities.
It ceases essentially to explore the full space of possible, possible solutions, and it gets locked in. And then the next, in the next stage, when it's kind of going through this process of, um, of producing the next token in the sequence, again, it's gonna, it's gonna go, okay, well, sure, I'm locking on the first token, but for the second token, there's a wide range of potentialities I could explore. Again, it'll be forced to lock in.
Um, this is really interesting, because by keeping the reasoning in that latent space, by keeping the last hidden state and not decoding it, you're allowing the model to simultaneously Consider and explore a bunch of different strategies, all the different possibilities that come from token one, then get compounded with the possibilities come from token two, without being interrupted by the sort of like collapse of the of the solution into one possible mode.
And there are all kinds of implications for that. They do a great analysis of how you can actually then view this process as a sort of, um, A sort of tree, right? Sort of like a tree, a mesh, a network of possible solutions that are being explored at the same time, and how you can use that in turn to measure how effective the reasoning is. This is a paper to read. This is a paper to look into deeply.
Um, it is, by the way, it's kind of interesting, they use a pre trained version excuse me, of GPT 2 as the base model for all these experiments. There are a whole bunch of reasons I think that we should, we should suspect that this will greatly improve with scaling. Um, you know, GPT 2 obviously is a tiny, tiny model. Um, but the, the kinds of things that we're looking at here, exploring many different paths, right? Things that look like chain of thought.
These are all things that we've seen, um, improve a lot with scale. So I think that, that anyway, I think this is a really, really big deal for a lot of, a lot of reasons. And I wish we could do a whole episode on it.
Yeah, and if there's even more to say, there's quite a lot going on here. So an interesting kind of thing going on, for instance, is ideally the model could just be trained. You know, if we just do optimization, it will happen that it will be able to do, uh, I guess a similar notion is kind of recursive models, right? Where you feed the output back to itself and it. Gets better every time. In practice, I did find that you need to do curriculum training.
So there's a special training regime that they do this is training with kind of a variety of objectives over time. They, uh, compare actually also to another paper called ICOT, which is internalized chain of thought reasoning. So there's kind of another paradigm, which is from earlier this year, where instead of, uh, doing this, which is essentially taking chain of thought reasoning and training a model to do chain of thought reasoning.
Well, in this continuous space, you can instead try to optimize the model to sort of do the chain of thought reasoning implicitly. You train it to be able to output the same answer it would give if it had done chain of thought reasoning without outputting and decoding into that chain of thought. That also works pretty well, as you might expect, uh, and that is one thing that you could actually combine here. And they say that, uh, this is a future avenue of research, maybe you can do both.
You could optimize the model to implicitly do chain of thought reasoning and also empower it to do. I guess additional, uh, continuous chain of thought reasoning. Uh, and they do show that this, uh, technique works better than implicit chain of thought reasoning. Uh, but, uh, you know, they're, they're both pretty strong techniques here. Uh, yeah, lots we could go into, but probably, uh, we don't have too much time. So we'll have to leave it at that.
Next paper, also a pretty notable one I think from this past week. The title is an evolved universal transformer memory. This is from Asana, a startup that has, you know, made some waves. Sakana. Oh man, I'm showing my tool set that I used in my engineering day to day. Sakana, yes, uh, which. Started by some researchers that were quite experienced in the evolution area of optimization, where you don't do gradient descent, you instead do this non differentiable method of optimization.
I guess, I don't know if it's too technical, but basically you can optimize for things that you can't with the usual way that neural nets are trained. And they find that this is an example where you can train this neural attention memory model that is optimized to decide which tokens are worth keeping around essentially in long context use cases where you have You know, very long inputs and you need to essentially do a sort of memory, a working, uh, memory type thing within the transformer.
Uh, you can, uh, you know, usually this is sort of trained implicitly, I suppose, by just doing the usual. Uh, training with a long context input here. They optimize this technique to focus on the most relevant information for individual layers throughout the neural net and that improves performance across various long context benchmarks. And this can be combined with any existing large language model kind of training model.
Yeah, it's an interesting, it's an interesting approach and, and definitely more, um, inductive priors that they're, they're adding into the, um, the stack here. So basically, this is like, you know, your attention layers are going to look at your, your input and determine, okay, which tokens should I base my answer on the most or should the model based answer on the most assign a higher attention value to those and, uh, and move on. And there are a couple of issues with that.
Like, first of all, not all like you end up having these massive KV caches, basically the, the caches that hold, uh, the, the data that you need to calculate those attention values and, and then the attention values themselves. And the problem is not all tokens are equally important. Some of them can be thrown away and you're just taking up a whole ton of memory with, with a bunch of useless stuff that you don't really need to retain.
And so the question here that we're trying to answer is, can we build a model that selectively determines and throws away, um, uh, kind of unnecessary. Um, token data in the KV cache, and that's, that's really interesting. It's, they're going to do this with an ancillary model. They're going to use an evolutionary computing approach. Um, it's, it's a really kind of interesting game plan. The general intuition behind the workflow here is in part, they're going to use Fourier analysis.
So this is essentially, um, the, the study of, uh, let's say decomposing. Um, signals into wave like patterns, uh, what this is often used to do is, is to identify repeating periodic, uh, uh, patterns that appear in some input. They're going to apply that to the attention values in the input sequence that you're analyzing. And so you might wonder like, Hey, well, why do that? Well, it just because there are patterns that can appear in those attention values and those patterns make the attention.
vector more compressible, right? Anytime there's a pattern, you can compress a thing because it means that like a pattern by definition is a repetitive thing where if you have just part of it, you can reconstruct the rest. So this is exactly the strategy they'll use. Um, and it's all about figuring out, you know, how can I compress this to throw away data that I don't need, um, based on, you know, the tokens frequency patterns or how often it's being used.
Uh, or sorry, sorry, how it's being used, um, tokens positioned in the sequence, how it relates to other tokens through, uh, through backwards attention. And, and that's its own separate thing. So, you know, typically when you, when you, uh, train a, an autoregressive model, what you're doing is you're looking. So, so the current token that you're trying to predict on, uh, you get to base that prediction on, um, on, on all the tokens that come before, but not on the tokens that come after.
And you actually do know what those tokens will be, because you're typically during training pulling this from an existing sentence that has been completed. Um, so, but the problem is that often, often the tokens that come before, actually do have relevance to the current prediction. So there, anyway, they set up a backward attention mechanism that lets like earlier tokens, look at later ones and get information from those as part of the sole scheme. So anyway, um, it is really interesting.
I think it's another one of those papers that you'll, want to dive into if this is, uh, if this is your, your space of interest. But another way to tack on, you know, more complexity, there's more, more work for the compute to do. And I think that's a very promising, uh, path.
And it's, it's also an interesting, uh, kind of paradigm in that you sort of creating this additional module on top of a pre trained model. So there's kind of a base model you can take llama free, for instance. and train this whole other thing that kind of operates kind of independently or, or kind of adds itself to the middle in a way. Uh, and if you do that, it can sort of be transferred onto other models, uh, other large language models without retraining on them.
Uh, and uh, there's various uh, benchmark numbers they give. The highlight is on very long context benchmarks like infinite bench, bench. Uh, it does appear to help a lot and I think this is one of these. areas where there's been a lot of progress but it isn't quite solved per se. Uh, these kind of long context strings, so this could be very significant. On to the lightning round, we begin with Apollo SGD like memory on MW level performance.
So, uh, a bit technical but we'll try to keep it, uh, understandable I suppose. So, when you train a neural net, you're basically doing Uh, gradient descent and there's a specific version of stochastic gradient descent where you're sampling parts of the data. That's the fundamental way to optimize a neural net. You compute the gradients that you get from the errors on the outputs and you back propagate it.
Well, there is a bunch of other detail you can add on top of that with optimization and Adam is it. Usually the optimizer people use, what that optimizer does is add a sort of memory over recent uh, optimization rounds, and that allows you to know what amount of, how big a step you should take with the earning rate across different weights.
Now the issue with that, that makes you perform better, but that requires you to now store that information of previous backpropagation rounds that compute, compute. The updated learning rate.
So the gist of this paper is, as per the title, SGD like memory, Atom W level performance is, they have this Approximated Gradient Scaling for Memory Efficient LLM Optimization, Apollo, which approximates the learning rate scaling using some fancy stuff that allows you to get away from all the storage required by Atom. Uh, Yeah, I think that's a pretty good gist of it,
you know, yeah, yeah, so I think there's a whole set of papers in this category recently, and I think for for deeply, deeply strategic reasons, right, the big, big question right now is how do we scale AI training across a large number of like distributed geographically distributed training clusters. The reason is it's really hard to find a concentration of power of energy in one geographic location that will allow you to build one data center.
That's like, you know, a gigawatt or a 10 gigawatt data center, as we discussed earlier. So as a result, there's all this interest in how can we set up distributed training schemes that require, um, like less data moving around different data centers across long distances. And so now, essentially, we're interested in can we, um, compress, can we reduce the amount of data that we need to pass back and forth across a system like this? So. Enter the problem, right?
Adam W. This is the optimizer that's typically used today to train models and scale or one of them. Um, and the way this works is for a given parameter in your neural network, the, the, um, training scheme will remember, okay, there's, there's this much of an update that we need to make this parameter. Now, how much of an update did I have to make last time and the time before? And if all those updates were correct, Kind of point in the same direction.
It suggests there's a lot of momentum heading in that direction. So, you know, if they always said increases parameter value very significantly, well, maybe that means you want to really ratchet up that parameter value quite significantly apply a bigger learning rate essentially, right? Move it, make the update bigger, right? And then conversely, if you find that there's, there's less momentum, that's the basic premise.
But. I just listed three different numbers that you need to calculate, or sorry, to remember for that particular parameter, or you got to remember the current value, the current update, you got to remember the update last round and the round before that, right? So together, that's like three times the model size, the model size in optimizer state memory that you need to keep and pass around and all that stuff.
So the goal here is going to be to say, okay, well, can I just focus instead of like literally every single parameter? And my model could I, for example, zero in on, you know, one, one chuck of the, of the network, what they call one channel, essentially groups of parameters that tend to behave similarly, and just have a single scaling factor, single learning rate, if you will, that's for that chunk of parameters.
And that way, you know, I, I can divide the amount of data I need to remember by the, the number of parameters in that chunk. And they're going to show that in fact does work. It also applies, uh, to entire layers of the transformer. They do that in tensor wise, uh, compression, which they apply here. And, um, anyway, so, so this is, it's really, really interesting. The way they do this is. Kind of trippy. Uh, we're not going to get into it.
This thing called random projections is used, which by the way, like Andre, mathematically, this still blows my mind. You have a random matrix. You multiply it by your, the, the parameter update matrix and you get out like you can get a smaller matrix depending on the dimensions of the random matrix. But you, that smaller matrix will preserve some, some critical mathematical properties on the original matrix, even if, even though it's multiplying a random issue, it doesn't matter.
It's a John, it's the Johnson Lyndon Strauss lemma, which is, this is the first time I ran into it. Holy shit. Makes no sense. Random projections. What the fuck? Cool paper. Uh, nice work. That's it.
Yeah. Fun fact, uh, there's a whole area of research or where this was where you could do random projections for hidden layers. In a neural net, usually you update all the weights in your neural net. Well, you can actually just randomly initialize a bunch of them and that still helps you, which is another one of these properties that's really curious. Uh, and I real quick, I do want to, I do like to do this every once in a while.
This paper is a collaboration between the University of Texas at Austin and AI at Meta. So I think there's been a lot of worry in recent years about universities. Not kind of being able to do useful research essentially, because you do need these crazy amounts of compute. Often it's been the case that people have interned at these large organizations like Meadow, Google, and did some work there originally from grad school.
I think this is another example where even if you don't have massive compute necessarily, or you have limited compute, you can do some really good, uh, useful research. All righty, one last paper or research work. This is from Anthropic and they call it Celio, Clio, Clio, Clio, sure, Clio, a system for privacy preserving insights into real world AI use. So the idea here is you have a bunch of people using Cloud. Presumably you want to be able to understand how we are using it.
Like, are we using it for coding, are we using it for learning, etc, etc. So, this is essentially a framework that automates anonymizing and aggregating me. Data creating topic clusters from all these conversations about exposing any private information because if you're looking at conversations, right, someone might be saying, Oh, this is a medical information. You do not want to be able to expose that as a specific thing that people are talking about.
So this is a technique to discover usage patterns. Uh, they revealed some interesting things like for instance, Over 10 percent of conversations are focused on web and mobile application development. Educational purposes and business strategy discussions are also prominent with 7 and 6 percent respectively. And, uh, yeah, I think there's a lot of interesting, uh, this is one of these things that you presumably definitely need as an LLM developer to know what people are using your LLM for.
And, uh, yeah, this would allow Anthropic to fine tune their model effectively and also would improve safety measures by identifying potential policy violations and coordinated misuse. All right, moving right along, we have policy and safety next. The first story is a little bit of a dark one, but I do think important to cover. It has to do with character. ai, which, quick recap, character.
ai is a chatbot platform, a very popular one where people spend a lot of time talking to artificial intelligence characters. In recent months, they've had two controversies and lawsuits, one where, um, a teenager seemingly or supposedly due to some of the influence of character AI, this teenager was very obsessed with character AI, well, Uh, they ended their own life, which was quite tragic. The parents say that character AI was partially at fault.
And that was another incident also with harmful behavior that character AI may have augmented. So character AI is now stepping up teen safety. They are introducing a special model, a teen model that aims to guide interactions away from sensitive content and reduce the likelihood of users encouraging or prompting inappropriate responses. There are also classifiers to filter sensitive content and improve detection and intervention for user inputs.
And I think this is one of these things that specifically for characters that AI is very important, but also in general, as you see more and more people interacting with AI more and more and interacting more and more kind of intimate ways or human like ways, it's inevitable that you'll see more of stories of the sort where a person was Maybe erroneously encouraged to do something bad, or maybe erroneously, uh, you know, uh, motivated in a way that probably shouldn't have been the case.
Uh, this is another kind of area of AI safety that maybe hasn't explored too much, like group psychological, uh, Influence that, uh, AI models may have over people. So a very real example of that already happening in the real world. And, uh, this company particularly needing to tackle that.
Yeah. Yeah. And I mean, you know, this is one of those areas where you might expect pretty soon. Some, some regulation, I would imagine, you know, it's something that, you know, Congressman have kids who use these tools and so I'd expect him to be fairly sensitive to this. Um, there's also just like, you know, the challenge of looking at, at children as a, how would you put it, like canaries in a coal mine for, for adults, right?
Like, You're talking about autistic teens now, but as the systems get more persuasive, um, we have some really fundamental questions to ask about where the interaction of any human being in an AI system goes and in a world where you can be convinced it doesn't exist.
Uh, of a lot of things by the chatbots you interact with, um, you know, that there's, there's a long tail of, of humans at various stages of life who, uh, might find this stuff really compelling and, and be induced to do bad things as a result. So really hard to know, um, yeah, where, where this all goes, but it's at least, you know, good that there's now pressure.
Um, to, uh, to move in this direction, there is, uh, a notice that says you have to be 13 or older to create an account on character AI, um, you know, and, and then they do say users under 18 receive a different experience on the platform, including a more conservative model to reduce the likelihood of encountering sensitive or suggestive. Content, but age is self reported.
So, you know, like, I think it's a, an open question as to how effective those sorts of measures are going beyond that does require though, thorny things like proof of not necessarily proof of identity, but like proof of age, at least at a more, uh, compelling level. And so there are privacy implications there too. It's just a really hard problem to solve. Facebook ran into this early on.
Uh, when they were trying to prevent people from using it, uh, under, you know, under 13 years of age and other platforms have too. So, just challenging problem and, and unfortunate reality of, uh, the current state of chatbots.
Uh, there's, um, a bit more here, uh, as well, to worth mentioning. So, this is Uh, an issue partially of potential kind of encouragement of bad behavior and another aspect of this is addiction, where in a lot of cases, especially in the cases here, the teenagers were You could say obsessed or you could say addicted to talking to these AI characters, spending hours and hours, uh, talking to them.
And, uh, this is coming, this announcement from Character AI is coming after, uh, like almost immediately after another lawsuit, uh, being filed. So this was filed this past week. As you said, in this one, there was a case of a 17 year old boy with high functioning autism who, uh, was spending a very long, a very, uh, big amount of time talking to character AI, supposedly was encouraged to be violent towards his family, his parents, uh, things like that.
So, uh, another aspect of this is people might really get addicted and, and seek, uh, companionship and social support from AI in a way that isn't healthy. Another aspect that these sorts of platforms really need to start tackling, and as you say, regulation might need to address as well. And next story, now moving to policy, the title is what Trump's new AI and crypto czar, David Sachs, means for the tech industry.
So, as that implies, there is a is the news that there is going to be an AI and crypto czar, David Sachs. This is a bit of a weird one. This is not sort of an official role. There's not going to be a Senate confirmation for this appointment. It's going to be a part time role. He's going to keep his, Uh, position business position as someone who works in venture capital. David Sacks, for the record, is a pretty notable person, hosts a very, very popular podcast called All In.
One of the hosts of that podcast has been a big supporter of Trump. So, what would this mean? Uh, presumably, of course, very business friendly approaches to AI and crypto. A very pro industry approach. Also has expressed support of integrating AI international security and defense. And, uh, with regards to crypto, just quickly mentioning it, it is also going to be the case that there's going to be, uh, relatively little regulation, let's say.
Yeah, it's really hard to tell what the left and right bounds of, um, of this position are going to be like, it's, it's, you know, it doesn't fit the standard mold. And if you look at, for example, department of commerce, They have a, a workflow that's just completely different from, you know, and it doesn't have a, a way to interface with this position. And so you might naturally wonder like, what is going on?
I mean, this article speculates, excuse me, that, um, uh, it may be more about relationships, uh, than that kind of conventional formal channels of, of influence over departments and agencies. But at the end of the day, this does mean that. Uh, SACS is going to be in, uh, in the White House and, and influential certainly on AI and crypto. One of the questions too is, uh, how far does this extend into the national security sphere? Um, I think that is probably the core question here.
It seems very much as if, especially given the remit is AI and crypto, this seems like a very industry focused thing when you start to think about, okay, but you know, what about the national security risks associated with the technology?
Not that he won't be a voice, he presumably will be, but there are probably going to be other, other voices at the table as well, um, and, and finally, I mean, it highlights, as the article points out, and is quite apparent to people following the space, there, there are two different camps in the White House right now, you've got the Marc Andreessen Sort of like David Sachs
camp of, um, Hey, let's like, you know, get, get AI developed and, and basically, um, you know, who, who cares to some degree about the risks or, uh, that's a bit of a, a caricature, but more like, you know, the, the benefits far, far outweigh the risks and, and, and let's forge ahead. Um, and then you've got, uh, Elon, who obviously has talked a lot about the risks from weaponization, loss of control, that sort of thing.
So how exactly that all shakes out, that's going to be an interesting thing to track. Um, and, uh, the David Sacks appointment is, uh, yeah, again, really hard to kind of pin down and that may be the idea. You know, I, Trump may just want him around to offer input. Um, I do notice his maximum number of working days a year is set to 130 because he will be a special government employee. So part time, um, and, uh, and.
There's some, some controversy flagged here that that doesn't require him to divest or publicly disclose his assets, which leads to potential conflicts of interest and things like that. So, um, anyway, kind of a interesting one to watch and we'll see. I mean, I think a lot of this has yet to take shape and the administration probably doesn't even know what his role will be. They'll just sort of play it out and see how, you know, where the influence ends up lying in practice.
On to the lightning round, now moving on to some research. We have a paper frontier AI systems have surpassed the self replicating red line. So self replication is one of these concerns with regards to advanced AI.
Essentially, if you have an advanced AI model, well, maybe you, you know, can keep it under lock and key, can keep it keep it controlled, not allow it to access the web, etc. But if the AI model is able to self replicate, to make a copy of itself somewhere else and sort of unleash itself onto the world, that could lead to bad things if a model then decides that it should do something opposed to human interests.
So this is one of the kind of common Uh, things to watch out for, that Anthropic, for instance, evaluates continuously whether the new generations of models are capable of it. In this paper, they show that in, uh, 50 and 90, uh, percent of experimental trials, uh, trials with several models, LLAMA 3170B and Alibaba's CRAN 2572B, Uh, they are able to succeed in creating a live and separate copy of itself, respectively.
What this means in practice is they are able to generate the code necessary to deploy another instance of themselves, which, uh, in a way isn't necessarily surprising. I think if you have been working with these things, you know, if you are meant to do it, uh, you And you, you sort of are allowed to self replicate, for instance, uh, this is the sort of engineering that the advanced models are, should be kind of getting to a point of being able to do.
Uh, so yeah, not entirely surprising, but definitely worth, uh, taking a note of. And Jeremy, I'm sure you have more to say on this.
Yeah. I mean, I've seen this paper circulating around quite a bit. Uh, there's. And to your point, I think there's a bit of overhype on what exactly I don't call it overhype. I mean, it all depends on what you mean by self replication. One of the first questions you might have looking at this, if you're tracking the space, you might remember the company meter that does these self replication evals for open AI and other companies in the space.
And they've consistently found with models to date that yes, they are getting better in some cases, much better at. doing self replication type stuff, but they're not really able to do it well. And we're definitely not there quite yet. Um, so the question then becomes, what is the difference between this and the meter evals, and it comes down really to how you define self replication. Um, meter tends to look at. Deep replication that includes like model weights and architecture.
This one is more about, as you said, copying files, launching processes, like the engineering side of getting this done, which can still get you a working system. And to the extent that you care more about system automation than true deep AI replication, then the threat model actually kind of holds.
Um, but, uh, certainly, you know, there are, there are other differences like the meter evals tend to involve a more constrained environment, kind of less access to things like, um, You know, like, uh, system commands, file operations, third party tools, and they really worked a lot on their, uh, kind of their, their agent, um, uh, framework that the
kind of wrapper that sits around the model to help, you know, turn it into an agent, um, meter, at least in its most recent test hasn't been able to do that much work on that piece. So I think it is, it is notable. Um, they, it's also mixed with especially some of their more concrete scenarios do come with a lot of kind of hard coded rather than native AI behavior, whether they deliberately prompt the model to do certain things. Um, and, and then they.
They sort of marvel at the fact that it does those things successfully, mind you, which is good, like, or not good, but which is a legitimate finding, but but it remains true that this is with deliberate prompting. So essentially comes down to which threat models you're concerned with the most if you're concerned with sort of like a sort of autonomous AI driven kind of independent urge to self replicate this won't scratch that itch that is, you know, That's a power seeking measurement.
That's not what's being measured here. Um, what's, what they're really looking at more is the capability dimension itself. And, um, that is, you know, if you're, again, if you're concerned with this general, uh, threat model, yep, this might be a modest update, but, um, I don't think it's anything that, as you said, anyone's going to really be surprised by the capabilities, at least. Uh, given that we, you know, we've seen these models do similar things in other contexts.
That's right. Yeah. So it's, it's one of these cases where you really should read beyond the headline, which sounds a bit serious to the details. Next up, getting back to geopolitics, as we often touch on the title of the article is chip war. China launches antitrust probe into NVIDIA in a sign of escalation. So there's an investigation that focuses on NVIDIA's 6. 9 billion acquisition of Mellanox technologies, with the claim that this might be violating China's anti monopoly laws.
Monopoly being, you know, in case anyone doesn't know, probably most people know, but if you're a dominant player, you're a dominant player. in some industry and are stifling competition. So this deal happened back in 2020. It was approved by China, but required NVIDIA to supply products to China under fair and non discriminatory terms. And as you might expect, this could be an aggressive measure by China to retaliate against US policies.
And NVIDIA's stock did take a hit 1. 8 percent drop Following the announcement of the investigation, not even anything regulatory happening yet.
Yeah, I think this is like, you know, pretty standard CCP response to export controls hitting, like we've just had a tightening of export controls around as we talked about last week, high dead with memory, um, some, some lithography. Equipment exports as well, things like that. So, you know, China going tit for tat, um, this is a line as well with their restriction of, uh, exports for rare earth minerals. That's it.
They're, they're really looking for all the ways that they can, um, try to frustrate, uh, American companies and American AI efforts. Um, you know, this is all part of the reason why the solution to this was always, I wouldn't have been politically feasible, but the solution to this was always clamped down once hard and decisively. On Chinese exports, um, you know, back in like 2019, 2020, again, not politically feasible.
Um, but what, what we're doing is we're playing this kind of like losing game of whack a mole where you try to, to patch up one gap and then another appears. And then every time you incrementally increase the threshold of export controls, um, now the CCP is going to do a retaliatory action. So if you, if you did something decisive early enough, you know, who knows, maybe you could have, um, Obviated some of this, then again, you know, China has like less to lose in that context.
So that's really what we're getting at. The export controls are actually really starting to work. We've seen a number of indications of that, and this is now really getting under their skin. They're also trying to posture ahead of the Trump administration to try to make it seem like, oh, you know, if you come in with stronger sanctions, we're going to bite back even harder type thing. Um, you know, non, non negligible concern, especially on rare earth exports.
The U S is just like, terribly positioned on that stuff. And that's just a self inflicted wound. Um, but it can be fixed, uh, with, with the right deregulation and it can be fixed with anyway, the right investment and focus. But this is just, you know, sort of standard fare and something that I'm sure the administration actually expected going in and stuff like this.
And speaking of export regulation, the next story is about another territory that's been sort of, uh, let's say unclear, a bit of in a gray zone. And it seems US has cleared export of advanced AI ships to the UAE under a Microsoft deal. So there is a Microsoft operated facility in the UAE as part of a partnership with G42. We've covered that previously. I mean, some big investments in there. Microsoft invested 1. 5 billion in G42. That gives it a minority stake in the board seats.
So they're pretty deeply invested in this organization as part of the UAE. And it has, yeah, been a question mark as to what the response of the U. S. government will be. As there are also some potential Chinese ties of G42. And so it seems that there is the export license. It does require Microsoft to restrict access of this UAE facility from personnel associated with nations under the U. S. arms embargoes. or on the U. S. interview list.
So, essentially, you get the export license, but you got to be still respecting the restrictions that have been placed on China.
Yeah, apparently the, the license now that's been approved, um, requires Microsoft to prevent access to its facility in the UAE by personnel who are from nations under U. S. ownership. Arms embargoes or who are on the entity list, the famous, uh, BIS Bureau of inter of industry and securities entity list of the Commerce Department.
Um, this is the list that contains companies like Huawei, like YMTC, some of the, the big players in the Chinese ecosystem, and, and frankly should contain a lot more. Um, and frankly should probably be a white list and not a blacklist, but, uh, uh, , but I digress. Um, so right now there, you know, there all these requirements are being added to, to essentially prevent this stuff. It's kind of interesting, right?
Um, this starts to, if you know, the, the kind of world of, of, uh, policy and, and arms control policy, it starts to vibe a little bit like a shade of ITAR, like, you know, they're starting to think about the, the next, so, so ITAR is, is this sort of non, is counterproliferation, um, uh, essentially policy, it says, um, if I give you a
special technology, Uh, you are only allowed to pass it on to other people who are ITAR approved, if you will, and, um, and if you fail that, then you're in big trouble, right? So the idea here is that they're sort of, you know, pass this forward, but say, hey, you can't, you can't, uh, feed this forward to people who are on the entity list who, who aren't screened in.
Um, it's kind of interesting because it is a step in that direction, which would, like, I think one of the things that, That you need to look at from a national security standpoint is officially classifying AI as a dual use technology under ITAR, um, more advanced AI systems, not the, the kind of general purpose ones that we have lying around today.
So anyway, um, all very interesting, um, the restrictions cover people physically in China, the Chinese government or personnel working for any organization headquartered in China. So clear what is in the, uh, the kind of target zone here in terms of, uh, G42.
And onto the last story and moving back to the US, the White House has created a task force on AI data center infrastructure. As Jeremy mentioned earlier in this episode. So this is going to coordinate policy across the government and maintain a US leadership. in AI technology is, you know, the party line. So, this will involve, of course, the Department of Energy. They will create an AI data center engagement team and share resources on repurposing closed coal sites, apparently.
The U. S. Arms Corp of Engineering will also identify permits to expedite AI data center construction. And, uh, there's also some stuff here on industry exports. Uh, yeah, seems pretty good. very much in line with what is needed to expedite and enable these very, very complicated data center constructions, uh, to take place.
Yeah, the, the, the big challenge here is that it's, it is a whole of government issue. So you have to have coordination across the Department of Energy, Department of Commerce, you've got national security increasing that national security considerations to be brought in.
And so what you're essentially seeing is the government recognizing that and saying like, Oh crap, we need, so the National Economic Council, National Security Council, by the way, these, these are councils that advise the president on, uh, on, you The issues of the day. So the national security council, um, you have a bunch of, uh, usually they're fairly prominent national security people. And then the staff does a lot of the key work, the national security, the NSC staff.
Um, and so essentially all kind of coordinating together at the white house level to, um, to solve problems like, Hey, we have, how do we deregulate in a strategic way? Presumably the Trump administration is going to be more aggressive than this, even, um, especially on environmental regulation, deregulation, things like that, things that are blocking.
Uh, the development of these new builds, the development of new power plants, uh, you know, the, the sort of national security, uh, vetting of, of sites and things like that. So, um, yeah, I think it's, it's interesting and noteworthy that we're now at the point where this is becoming a white house priority. Um, and, uh, and a lot of talk as well about getting the military in various forms to, um, to support here. And, and this is, yeah, as you say, the army corps of engineers, right?
So now, now the DOD is involved. Um, so yeah, very wide ranging, uh, effort here.
And that is it for this episode, one up with a bit of a long one. Lots to talk about this last week. Thank you so much for listening, especially if you did make it to a very end and are listening to me now, that's impressive. You made it through an entire episode. Uh, as you probably already know, you can find the links to articles in the episode description. You can also go to lastweekinai. com for that, or lastweekin. ai where you get to text newsletter.
As always, we appreciate any comments, any feedback. We try and read those, even if we don't mention them on the show. And we do appreciate, uh, reviews, you know, getting those five stars always feels nice, but more than anything, we do appreciate people listening. So do make sure to keep tuning in and enjoy the AI outro song.
Get around, it's time to cheer, Episode 93 is here. Open eyes, sail, ship, NASA, what a sight. Gemini 2 shines bright in the starry night. AI agents surf the web, browsing all so free. New discoveries and joys in the world of AI. This. It's festive season, let's raise a joyous cry. Stories and folktales of progress made in the A. I. realm where wonders cascade. Ship must bring joy to coders and breeds alike. A. I. in every corner, changing day to night.
Gemini destroyers lights up the tech sphere With every leap and bound, future grows near AI agents learning with each click and scroll Deriving new narratives, bringing us all AI agents surf the web Drowsing all so free New discoveries and joys in the world of AI Aaaaaaaaaaaaaah To every new day As snowflakes fall Our ballerinas dance with glee Wrapping our futures for the world to see Sing the lullaby for our digital knights Steering the path with luminous insights Let
the joy of progress spring throughout the land Hey, I'm a thriving like Go where we correspond!