#171 - Apple Intelligence, Dream Machine, SSI Inc - podcast episode cover

#171 - Apple Intelligence, Dream Machine, SSI Inc

Jun 24, 20242 hr 4 minEp. 210
--:--
--:--
Listen in podcast apps:

Episode description

Our 171st episode with a summary and discussion of last week's big AI news!

With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris)

Feel free to leave us feedback here.

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at [email protected] and/or [email protected]

Timestamps + Links:

Transcript

Andrey

Hello and welcome to the latest episode of Last Week in AI, where you can hear us chat about what's going on with AI. And as usual in this episode, we will summarize and discuss some of last week's most interesting AI news. And you can also check out our last Ukrainian newsletter ad last week in AI for the stuff we did not cover in this episode, I am one of your host, Andrey Kurenkov.

My background is that I studied AI at Stanford, and I now work at a generative AI startup that we'll actually chat about it a little bit this week.

Jeremie

Ooh, okay. And hold on, new story. We talk a new story.

Andrey

Yeah yeah yeah.

Jeremie

She all right. This is going to be a good week. I'm excited here about that. So yeah I'm Jeremie, obviously the other co-host here. I am the co-founder of Glad Sony AI, which is the AI national security company. And, yeah, I'm, I'm really keen to dive into this one. We did not do an episode last week. And so we have two weeks worth of stuff to catch up on, and it's been an insane two weeks, so it's, like, unusually dense.

Andrey

Yeah. I was going to joke that we got lucky last week and nothing major happened, but that is of course not true. So apologies, listeners, that we did skip it. You know, life happens, but we are back and we will continue to try to keep this weekly. And real quick, before we do dive into the news, as usual, I want to give a shout out to some comments and reviews. We had some really nice comments, more and more coming in on YouTube that, give us some nice notes of like,

having a chill outro. It's fun to see that some people are listening to the outro AI song. A couple reviews on Apple Podcasts, one of them saying great weekly review show, a couple saying that we need less, I do more talk and being a little, let's say,

critical on Jeremy. And we do appreciate the feedback and we I guess we'll try to keep discussing safety and alignment, but maybe keep the hypothetical X risk doom talk in its own sort of section and more, topic specific episodes, unless it comes up and will be relevant to a new story, as it will, pretty soon in this episode.

Jeremie

Yeah. I mean, I think it's also, you know, it is tough because it's it's the zeitgeisty, obviously. So it's sort of hard to talk about frontier AI without talking about, you know, the catastrophic risk piece, because that seems to be a consensus view among the top labs. But obviously we have a section for AI policy and safety. We'll make sure that we focus the discussion

there. Sometimes I think what's been happening, too, is we've had a lot of big models, and sometimes in the big models get released like GPT four. Oh, it's always hard to know which section to put them in as a result. But, yeah, very much appreciate that. That's yeah, that's great feedback. We want to make sure that we're, we're tracking what people are interested in hearing about the most. So so do appreciate that.

Andrey

That's right. And now moving on to tools and apps starting with our first story. And it is, of course, this time from last week, the big news story that happened was, Apple's announcement of its AI stuff that has been long awaited. So we now have a name for it that I think we knew, kind of a preview that we covered a couple of weeks back, Apple Intelligence. And at this event, they held that they sort of rolled out a whole bunch of stuff.

So there's many features that are powered by coming to the iPhone and Mac, just to give you a quick kind of high level view, Apple's entire approach to AI is very feature first, as opposed to that's a model first. So unlike something like Google or Meta that have pushed out entire chat bots as an important part of their tooling, Apple announced many sort of smaller things throughout the ecosystem.

So, for instance, when you compose something, there's going to be a built in kind of tooling layer in the OS where it can do kind of standard chat bot stuff, of summarizing some text or analyzing the tone, things like that. Apparently Siri is going to get a big upgrade there, which I guess we all expected. One of the interesting things there is on screen awareness.

Awareness, meaning that Siri will understand the context of the apps you're using, and you'll be able to type to Siri and ask it to basically do. Stuff for you. So that's just a couple of things. There are many other things like a fancy calculator they showcased. And the other thing to mention is there was a big focus on privacy. So a lot of it was going to how it will be on device running some of the small models that Apple has gone into training, or it'll be in this new private cloud setup.

So very much in keeping with Apple's, philosophies.

Jeremie

Yeah. And that that whole, cloud setup, I think is really the story behind the story in a certain sense. Right. The, the hard thing for Apple, like all these companies, to accommodate that greater scale, it's going to be needed, is building out all the hardware infrastructure, all the cloud infrastructure that they need to

be competitive. And so one of the things that we've learned fairly recently is Apple is actually ramped up in a very significant way behind the scenes, their, M2, ultra SKUs, basically, they're they're like kind of latest chip that they're building themselves. It's actually a fusion of two chips together using something called Ultra Fusion that Apple's set up.

It's not unlike we talked about this before, but Nvidia's Blackwell, the B100 chips where they're kind of fuzing them together to make it be two hundreds. So same idea here coming up again. The interesting thing here is that Apple's hardware is not great for data center GPU use cases. And the reason is that they need to use their their cloud for a lot of different purposes. Right. They're focused, like you said, on delivery applications. It's not all about AI.

So they're trying to kind of do this sort of dual use sitting between two chairs thing, for their their cloud GPUs. And they're all kinds of interesting questions there about, you know, what what are you going to optimize that hardware for in the long run? And, and what it needs to be optimized for.

But anyway, really interesting to see Apple ramp up that that M2 production because, you know, there hasn't been a ramp up in demand for M2 laptops that that's been fairly constant, all this extra demand or production rather that Apple's engaging in, is really for those data center GPUs to increase that footprint. So it's going to allow them to go in the direction of saying, yeah, you know, everything's sitting as much as possible on Apple machines.

Obviously stuff that they send out to ChatGPT, which they will be doing, by the way, is as they've quite loudly said, going to be sort of by permission only. So you'll be asked explicitly, do you want your data to go to, you know, to ChatGPT or whatever third party tools? So Apple definitely is trying to find a way to kind of sit in between the the sort of let's give you the latest AI capabilities, but also, you know, preserve that privacy angle as much as possible.

This seems like a really good middle ground. But we'll yeah, we'll see what the performance ends up looking like on device.

Andrey

That's right. And speaking of on device, these features will only be available on iPhone 14 Pro and iPhone 15 Pro Max. Some of these iPads with M1 or later chips. And this is all starting with iOS 18 and Mac OS Sequoia. So most people are going to have to wait a little bit for those OS to come on board. So I don't believe it's, quite rolled out yet, but it will be coming pretty soon.

And, OpenAI did also announced this whole partnership with Apple seems like probably a big deal for them to have ChatGPT integrated into kind of VR OS layer, almost, where some of these writing tools of summarization and whatnot are going to be powered by ChatGPT, and Siri will be able to turn to ChatGPT for responses that it cannot handle. So lots of stuff coming out of Apple. We also have gen emoji, image generation, all that sort of stuff.

You can go to the link to get to a full kind of breakdown, but, I guess the general story is the unveiling of Apple's AI strategy, where these more localized AI features a lot of focus on privacy, integration of ChatGPT instead of developing their own gigantic models. And personally speaking, I think it's it's a really smart strategy and people seem to like it.

Jeremie

Yeah. It also speaks to Sam Oldman's business savvy. I mean, this guy is just like brokered a deal with Apple after partnering with Microsoft in, like, the deepest of ways possible. So it's kind of nuts. Like, I never thought we'd see the day that, you know, here we are, Microsoft and Apple partnering so deeply with the same company. It speaks to as well, the value that they see in the obviously, the OpenAI product line. Apple is not going to be constraining itself to just partnering with

OpenAI going forward. They have made the case that, you know, this is, in our view, the best, chat bot on the market. So that's why we're starting there. But they plan to go to other places, you know, Google and anthropic and so on for, for more partnerships.

Andrey

And Apple did have negotiations with Google. So interesting kind of to imagine why things happened the way. You're dead. But, regardless, that is what we're seeing. And next story, the title of the article is we don't need Sora anymore. Luma is a new a video generator. A dream machine slammed with traffic after debut. So, yes, there was, kind of a viral new AI tool last week, which we haven't seen in a moment. And once again, it has to do with video generation.

So Luma AI, which is a pretty new company, I think we have only been around for 14 months, have launched this free to use, publicly available beta, Green Machine, which takes, your text prompt and generates some video relatively quickly, I think only in a couple of minutes. So they say that you can actually generate 120 frames in 100 and 20s accounting for, I guess just a few seconds. But there was so much demand that some users had to wait hours to clear out the queue.

And, I played around with. It's pretty good. It's still not so level. There's still a lot of pretty obvious artifacts going on, a lot of AI in this, possibly because, I don't know, it's it's hard to make an AI video generator without stepping on some toes, I think, just in terms of a training data. But regardless, yeah, this is a trend. We've been seeing more and more AI video generation. And I'm sure we'll keep seeing it. And Luma, has now positioned itself as a major player in that space.

Jeremie

Yeah. It's mean one of the challenges to comparing this to, to Sora is that we really don't know what Sora is capable of, right? It's all. It's all sort of under wraps. There have been tests of luma system that are reported in this article, and I'm curious, Andre, if this lines up with your experience, you know, they they were saying, sure. That, you know, the video was generated is extremely smooth. They say non jittery, high resolution, highly detailed assets.

That's all good. but they did flag that. There's, as they put it, only sporadic accuracy in terms of depicting what we asked in our prompt. Sort of like prompt faithfulness seems to have been an issue. Is that something you ran into?

Andrey

I did, yeah, I tried to ask for like two robots, five casting in the future and

Jeremie

Wonder has plans.

Andrey

Result did have, sort of the broad strokes of that, but it didn't quite capture it. I think I asked for puppies playing with kittens, and I only got kittens or something. Only puppies. So that was like a pretty obvious failure. So yeah, I can say that it does seem to have that issue, and it did have some of the obvious sort of AI things of mystery, illness, of things kind of melting into other things and, and whatnot. So definitely not quite, there yet. But another thing to point out is a saw.

It was never quite clear how much compute very using it could well be they were essentially rendering for minutes at a time to get a few seconds of footage. Whereas with this it was near real time, which is very hard. Video is like harder than image generation. So definitely a fun thing to play around with. And, I'm not sure if it's still public. I think it is. So if you want to just see what this thing can dream up, you can go ahead and, Google Dream Machine and and go try it out.

Jeremie

Well, a to your point, to like the, the idea of it being more of a kind of lower compute system, we also have a bit of a hint of that by virtue of the fact that Luma has it's only raised about $70 million. Right. So and the most recent one I think, is the $43 million series B, that's what's fresh as of January. So there isn't a whole lot of compute here to just be pouring out, between inference time and training time.

So this is a pretty lightweight model. Like, no matter what, there's no way they're doing this on anything like a true foundation model budget. Which is kind of interesting, right? Like, not something that I necessarily would have expected, especially after having seen the, you know, the the fanfare that went into saw, I'd be really curious how the, the number of flops compared to, say, between the two models to see which one was more scaled and by how much.

But this definitely seems like, you know, if it works at the scale and they're serving it for free and a demo and the generation rate is this high, you know, this is a pretty efficient, efficient model and efficient training run as well. So end to end, that's pretty cool.

Andrey

Yeah. And Art of Lightning Round. Next story is once again about video generation. This time it's about a runway, and they have also unveiled a new AI video model called Gen three Alpha. So this new model is capable of a ten second long clips and it's, extending on their previous gen models. I think the last one that was around mid of last year. So it's been a little while, as you might expect.

They say that, you can generate high quality, detailed, highly realistic video clips up to 10s in length. The initial rollout at least will support five and ten second generations, although, it might be that you could take, longer videos in the future. And they say that a five second clip will take 45 seconds to generate, and a ten second clip takes 90s. And although there's no precise release date yet, they did say that V two will be available to subscribers of runway soon.

So yeah, runway one of a leading sort of AI video editing tools that's been around for years trying to stay in the fray and, stay, I guess, competitive as more and more players get into this space.

Jeremie

Yeah. And I really think in just the same way as we've seen, latency become a real issue or a really key metric rather to track in both text to text. Right? So, you know, really slow latency makes it kind of brutal because you have to wait for the, the know the tokens to stream in, but especially text the audio.

Right. So you want this model have really low latency so that when you when you speak, the audio translation comes out as quickly as you're as as you're sorry when you type, the audio translation comes out as quickly as you're typing or vice versa. It just, you know, once you start to use these things for, you know, audio to audio interaction, giving commands to your system, it becomes really important for that natural interaction.

And a big part of what makes ChatGPT for audio, for example, work really well. I think something really interesting is going to come when the latency of text to video starts to drop significantly, right? You can imagine if you have, for example, like a five second clip that takes, you know, five seconds to generate. If you start to get to the point where essentially your, think about it

as almost like buffering, right? It. For those of you who remember the early days of web 2.0 and you're waiting for videos on YouTube to load, well, the same thing happened there, right? All of a sudden you could watch the video at the same rate that it was buffering, so you could have a continuous viewing experience. Well, same thing here, right? If you're able to just get these, videos to generate, to have a latency, it's low enough that they generate as fast as they're watched

as they're viewed. That's a qualitative shift. And you start to get into really interesting things like custom YouTube. Right. If the quality becomes high enough. So, anyway, I think this is a really interesting dimension to track. And that number has been coming down real fast. So I'm really curious, to see, you know, what are the economics of this, what's the quality trade off going to be like?

But I think with more hardware coming online in the next couple of while months and years, this is destined to to head the way of YouTube.

Andrey

Yeah. And just looking at the example videos, you can look at them by going to the link. Pretty impressive I will say it, really kind of did blow me away. Although once you play around with it, maybe you can break it. And next story. Leonardo I way image generator adds new video mode. Our last video generation story. Leonardo has primarily been about asset generation

for things like video games. And they now launched a new tool called motion, which can convert those images into short video clips. So that generates 3 to 4 seconds of video footage. And all you really do is, clicking a button and, going ahead and getting that video. And one interesting note is that they do say this is an adopted version of stable video diffusion. And according to this article, the results can vary in quality. But, do have nice lifelike human movement.

And that's one of the, I guess, notes to take is Leonardo has even less, funds than Lumia 31 million, but because of, open tooling like stable video diffusion, even kind of smaller players can afford to pull out tools like this.

Jeremie

Yeah. I'm really curious how many of you see dollars are just being lit on fire at this point with things like open source, you know, as some of the open source plays that were. Seeing.

I can't imagine this is sustainable where people just keep pumping out, you know, better, better and better, or in some cases, not even soda, like not even state of the art open source, you know, text attacks or text to video models, but things that fall in the middle of the pack just kind of sometimes I have to imagine for a headline. So it's, yeah, I'm, I'm curious if, how many of these startups are actually going to be around when we're talking about this, you

know, six months, 12 months down the line. But it seems almost like we're heading to a world where, you know, the competition, the space is going to be pretty, pretty intense in potentially the fairly near term. And you know what that does to margins, right.

That dries your margins down. So eventually, you know, some of these companies are going to have to decide if they're in the business of building, really powerful models or serving powerful models, like how much of the stack they want to own and all that stuff. So, the dust is still settling, but a lot of competition in the space. I think six months ago, like, we were just not talking about. I mean, we were talking about text the video as academic stuff, but

just the explosion speed. Anyway, it's been something else.

Andrey

Yeah. And, I guess notably things like, Runway and Leonardo at least are not pre-revenue.

Jeremie

Right?

Andrey

Yeah, yeah. But when am I releasing a free version of the thing to play around with? Like, that's some money, I think. So that's one of his moves. That is pretty impressive. And, one more note on this one. Earlier, I also did announce a new feature called Real Time Gen, which allows you to do AI image generation as you type. And then you can kind of test out ideas and iterate. So that's another kind of crazy mark of where we are.

It's used to take like a minute to generate AI images back in the day when it was like VQ Gan and up and so on. Back in the day two years ago. Yeah. And now it's real time. It's crazy.

Jeremie

That's that's, that whole, sort of latency piece. Right? The so latency is so important not just because it, you know, it YouTube as if you will, you know, your buffering speed is as fast as your viewing speed. But but also because, you know, the more inference you can do at inference time, like you could actually have these models, kind of do more inference during the generation process.

If you get to the point where, you know, you could generate video, say, twice as fast, three times as fast as you can watch it, then you can actually do several passes at generating the video and optimize it before you get to it. Again, I mean, I think all of this stuff is going to compound and lead to some really wild experiences with with text video pretty soon and obviously image to video here to.

Andrey

And next story we got on from Nick. Just dropped clod 3.5 of sonnet, who have better vision and apparently a sense of humor. So I suppose similar to GPT 4.5, we now have clod 3.5 sonnet that, as we saw before, is pretty good. They say it actually beats opus. And of course GPT four oh, Gemini 1.5 Pro, etc. on a bunch of benchmarks are the usual suspects. And they do say better, on image based tasks in particular. So it can, for instance, beats CAPTCHAs. As with most of these kinds of models.

And the other thing they introduced is also this feature called artifacts, which allows cloud to run code snippets or display a website in the sidebar, something that GPT has had for a while. So if you write some code to generate a graph, now you can actually see that graph, in a little sidebar on Claude. And, yeah. I mean, just another example of how anthropic is really trying to compete with OpenAI.

I've also seen a lot of ads for clod popping up in San Francisco on my feeds, etc.. So they do have an uphill struggle just in terms of name recognition and try to get people to use it. But yeah, they do seem to be, you know, rapidly putting out the features that ChatGPT has had.

Jeremie

Yeah. And, you know, in private conversations, folks from anthropic have previously talked about how their your strategic outlook here was to never be ahead of the pack, never release a true frontier model because of course they are really concerned about safety. And I'm going to apologize here because we have to talk about the safety piece. It's anthropic. It's their entire business model. But you know, they've made commitments in private and sort of alluded

to this somewhat in public. That, yeah, we'll never be the leading model because we don't want to exacerbate racing dynamics in the space. We saw them test the waters with maybe arguably violating that principle with clause three when it was first released, called three opus. That is beating GPT four in some ways. Call it 3.5. Sonnet seems pretty clearly above the fold in terms of the other options in the market, but certainly not

going to be, for example, the best that. What of what OpenAI has internally or Google DeepMind perhaps has internally? You know, there are always delays, obviously, between what's built internally, what's actually released. But one of the big, big things here is the agent workflows, a genetic workflows, multi-step workflows that they focused on with Claud 3.5 sonnet here. So they're saying, you know, the 3.5 sonnet is twice operates at twice the speed of cloud three opus.

So presumably that's latency at inference time. That helps with the agent workflows. They do highlight that really this agent workflow piece is one of the key advantages that they've seen over other models, including cloud three opus. So they highlight this a genetic coding evaluation. Essentially, you can think of this as like, a complex coding task that the model will have to solve in many substeps, and it'll have to farm out task execution to different instances of itself, for example.

So in that category of task called 3.5 apparently solved 64% of problems compared to cloud three opus, which solved 38%. This is a huge, huge improvement, again with a model much smaller. Right? You know, this is not cloth. Three opus is the full size, model, which, by the way, as they say, club 3.5 opus is going to be coming out in the near term. But for now, much like Google launched, Gemini 1.5 Pro, but didn't go all the way and launch their largest model in version 1.5 quite yet.

We're seeing cloud 3.5, sonnet sonnet come out. But opus 3.5 is going to come out only a little bit later. So the sort of interesting intermediate, release. There are a whole bunch of interesting details that they get into. I think one of the things to highlight, you know, is not necessarily where is, sonnet 3.5 better than, say, GPT four, but where is it worse? What are the areas where it's still kind of getting beat out? And I had a look at the, just the paper and some of the data there.

So the ML, you essentially the essentially a language understanding, eval, you see a slight advantage for GPT four over sonnet 3.5, this benchmark is a little bit broken and it's already kind of saturated. So, you know, the the models are pretty close. So there's not much going on there. I would say. But the interesting one was the math benchmark here. You still see GPT four. Oh with a pretty big gap, 76.6% on that versus closed. 3.5 sonnet at 71.

So, you know, there are still sort of interesting narrow areas where, 3.5 sonnet is not number one, but across a vast, vast range of these. It is. One thing we do know about the back end is it apparently is not a huge lift in compute over, say at least clod three opus. So anthropic has a practice of running a certain set of extended safety evals. If they train a model with four times more effective compute than the previous biggest model they trained, they have not crossed that

threshold. They say, with this model. So we're not looking at an absolute gargantuan monster here in terms of coding and computing power. But, but certainly the capability Leap is really impressive. So there are some major, major algorithmic advances that are going on behind the scenes here. Needle in a haystack looking crazy. Especially compared to cloud three sonnet. So the previous version of sonnet, basically, it's close to 100% across the entire context length of 200,000 tokens.

So pretty impressive.

Andrey

This is a. And next section applications and business and for story is as it has usually been a this year open I we have spicy report that Sam Altman might turn OpenAI into a regular for profit company. So this was based in some conversations. It was kind of, fairly, you know, visa reports. It hasn't come out some widely or anything, but I think pretty significant to point this out. Open. I at the beginning was a nonprofit. Then they shift to this capped profit model with like a weird

corporate structure. And have you this idea of like, you can only get a 100 X returns or whatever. And now there is, it seems on the table the idea of open AI as a for profit corporation. So not much else to say right now. It's just sort of one report of this of Sam Altman mentioning it, but very significant if this turns out to be true.

Jeremie

Yeah. I think one of the things that I've seen written about later, adding some, some context to this is apparently they're mulling turning it perhaps into the sort of like public benefit corporation along the same lines as, for example, anthropic. So that would make it more in line with the industry standard of this very, very weird AGI industry. Still, I mean, you know, I think previously with some of the language people have used to is like Sam Altman sort of mulling, turning OpenAI

into this new kind of company. And the reality is, I mean, if Sam Altman can just mull this over, then functionally he has turned the company into whatever he wants. I mean, he has extreme levels. It's clear of influence, over the, the behavior of the board or at least had, you know, he's able to literally get himself hired back. He clearly has all the leverage in the world, over, functionally, over, over the board.

Because at any time Satya Nadella has a, you know, pretty clearly a standing offer to bring Sam Altman over to Microsoft if anything happens. So, and we've seen obviously, the employees kind of circulate that, that, that letter, that signed letter saying, hey, you know, bring Sam Altman back. At this point, it's really unclear what material control. And this is a real point of concern.

The OpenAI nonprofit board actually exerts over Sam Altman, who himself said it's important that the board can fire me. I think that's important. So, you know, at this point, like, when you can have, have Sam kind of openly mull, hey, you know, we might go in this direction, might go in that, unclear how much discussion there is behind the scenes with the board, but again, like the his influence over that board, I mean, his amount of leverage is pretty, pretty

extreme. So I'd be surprised if, you know, if functional controls can be maintained, at least in the, in the near term.

Andrey

And the next story also concerns OpenAI AI indirectly. We now know what Ilya Sutskever is doing. So he had, a big part in the OpenAI drama, you know, and now in November of last year, when Sam Altman was briefly fired and it was just announced that what he is doing is building a new company, Superintelligence Safe Superintelligence Incorporated. And he's doing this with a couple other people, AI investor Daniel Gross and Daniel Levy, who has been part of a technical staff at OpenAI.

And in the announcement, which is very, brief on Twitter, we just, you know, all it says in a tweet is superintelligence is within reach. Building safe superintelligence, SSI is the most important technical problem of our time. And we've started the world's first straight shot SSI lab with one goal and one product, a safe superintelligence. So clearly this is still pretty early on, but it does have early career. It does have Daniel Ross.

So we'll certainly see something come out of this if, if only a bunch of investors just shoving money into these people's hands.

Jeremie

Yeah. Oh, yes. True. That very likely will happen. You know, worth noting, Daniel Gross. Who is Ilya Sutskever is co-founder in. This is a former partner of Y Combinator. He was a actually, he was a partner at y c, when when I went there, he was actually one of the guys who interviewed me and, when my company, got accepted. But, he is the former head of AI at Apple. His company got acquired as an AI company.

All that. So really, really, really, really good at the VC fundraising dimension of this stuff. That being said, and again, you no technical details that we have on this as well. It's worth noting we don't know what the special angle is. We don't know what the strategy here is. However, in order to keep up with open AI, there are a couple things we know. We know SSI is going to have to do and fast. So they have one of two options.

Basically. The first is if they want to compete with, OpenAI Stargate Supercluster, which is slated to come fully online in like 2027 ish ish. And internally is being referred to as the AGI cluster. So if you believe that if that's your timeline, which Elyas seems to be kind of aligned with that, then you have to, you have to have a plan to match that amount of compute, capacity on that time horizon. The Stargate Supercluster is a 100 billion with a B dollar

supercluster. So you need to find a way to build something like that with a power supply, with grid infrastructure, because that's another issue that rivals, you know, OpenAI AIS cluster. That's going to be about 100 or, sorry, about a million h100 GPU equivalents somewhere in that ballpark. That's a huge amount of money. It is unclear like that, really. Anyone is in a position to raise that sum.

And even if they can then separate question, they're going to have to secure an insane allocation of GPUs from Nvidia. They're going to need, you know, probably at that point like BB1 hundred, B 200, whatever they go with. But you know, Nvidia is planning on shipping, I think about 2 million H100 GPUs in all of 2024. So we're talking here about essentially half of Nvidia's entire allocation for the year 2024. Granted, this is for a build out that's happening in 2027.

So a lot more capacity will be online then. But still, you know, this is a big, big chunk of Nvidia's allocation, a huge amount of money that would have to go into this. This is a very, very fucking hard fundraising problem. This is not an easy thing. That's option one raise $100 billion. Run that that competitive play. Option two is invent a ridiculously compute efficient training strategy or some model architecture that cuts down insanely on the required number of GPUs.

That is really, really hard as well. So two hard possibilities here. And having said that, if there's anyone who's not Sam Altman who can raise $100 billion, it's probably going to be Daniel Gross. And if there's anybody who is, well, I don't know. Not Sam Altman, not Greg Brockman, who can, figure out a more compute efficient training strategy to do this stuff at scale and compete.

Yeah, it's probably Eylea. That doesn't mean either is doable, but it does mean, you know, this team is really interesting, but they're going to have to solve one or both of those problems in order to have a really kind of, you know, clean shot at this. Again, given that their whole play here is we're not monetizing products on the way to see artificial superintelligence, we're doing it straight up with a straight moonshot. No apologies made.

We're going straight for it. So they've got to raise the funds and be able to do it in one shot. Otherwise you can't raise a second round, right? You get diluted it just the way startup equity works, there's just not going to be room on the cap table left. And, you know, that's that's an interesting, it's an interesting challenge. I mean, we'll have to watch this one for sure.

Andrey

Yeah, it's it's an interesting move. A part of me does wonder why Elliot didn't sort of follow John Lakey in joining on topic. And they're super in alignment team because that is presumably kind of the same goal there of doing the super alignment.

But it'll be very interesting to see what approach they take and what they do now after having announced this and what they're doing, not only do we need to compete with the likes of OpenAI, we have other players like x AI, which is currently building out their compute, reserves. And yeah, people like Microsoft, also trying to train their own. That frontier model is meta is committed this year to getting a bunch more compute.

So everyone all the big players are already kind of trying to get as much, of the GPU supplies they can. So a new player in the space is going to be very hard for sure.

Jeremie

Yeah. I think to your point about why he didn't follow you on like it, to open it up to anthropic. You know, one possibility is just the mundane. Like he's used to be a founder. He wants to be a founder. And, you know, maybe they could elevate him to that status or whatever, but that'd be, you know, pretty, pretty messy. The other possibility is just that anthropic has a a bit of more nuanced play here. They're not just saying we're trying to build superintelligence.

One shot. They're saying, yeah, you know, we're going to keep incrementally building, and we're trying to either show that it's safe, like prove that it's safer, prove that it's not safe. And if we prove that it's not safe, we're going to pivot our strategy to sort of warning policymakers, what I'm trying to trying to shut down things if we find out that it cannot be done safely, under any circumstances. So, you know, maybe he has a differing view.

Irreconcilable differences type thing. But to your point, you know, I think it's awfully tough. Even anthropic like might be in a bit of a tight spot fundraising wise, because they don't have that deep integration with a cloud super scalar like Google that DeepMind has, or like, Microsoft that opening AI has. So, you know, at anyway, really tough space and yeah, like you said, so much competition in the, in the AGI space, I guess.

Andrey

Yeah. And until Lightning Round. And next up, we got just a couple more stories on OpenAI, because apparently that is like half the news these days. The next one is OpenAI welcomes Sarah Friar as CFO and Kevin while as CPO. So Sarah Friar is a new chief financial officer. She was previously CEO of Next Door and was CFO and square and has a bunch of other, roles. Notably, she is a part of the Stanford Institute for Human-centered AI and was part of the Stanford Digital Economy

Lab. And then Kevin While is a new chief product officer. And, he has held VP roles at the likes of Facebook, Instagram and Twitter and also has a bunch of, roles to his name. So I guess a little notable given on Tropic. Also, had CPO announcement pretty recently. And it's kind of interesting to note that, you know, these gigantic companies that are raking in billions of dollars and spending tens of billions of dollars now have a CFO. I guess this is a new CFO, to be clear.

So if my the Eve out there just getting more senior people in, in these roles given the growth of these companies.

Jeremie

Yeah. Okay. Kevin, while it's kind of interesting. Right. Because so he, he, he had a bunch of previous positions, but one of them was as the co-founder of the Libra cryptocurrency

project. You know, I don't know if people are following cryptocurrency back in the day, but, you know, Libra was this big thing that Facebook famously trotted out and said, hey, we're going to have essentially a, like a, a, stablecoin pegged to the US dollar, cryptocurrency that will, you know, not be directly running, but some sort of industry consortium or whatever would be running it. And it triggered all kinds of concern on Capitol Hill and, and got pushed back to the point where

it was all canceled. It was then I think its name was changed to DM. So after that. But anyway, so that's all all folded and was sold to Silvergate and all that stuff, but, kind of, kind of interesting to find in pop up here. It's been a while since I've seen Libra in the news, so there you go. But yeah, all part of the sort of corporatization of OpenAI. Right? This is the professionalization of the, of the executive

class there. And maybe part of the reinvention that Tim is, engaged in over there of the, the executive level. Right, because they've just had all these big transitions. They lost. Ilya, the last year on the board has been reset. The board now is a lot more sort of conventional, and sort of traditional corporate, if you will. So maybe not too surprising to see these things, as a knock on effect.

Andrey

And next story. Last one up. Property AI. There was a report that opening I has doubled its annualized revenue in six months, and it is now at 3.4 billion, up from 1.6 billion in 2023. Presumably, this financial growth is in part due to OpenAI's, integration with Apple and also because of their new efforts on the enterprise front, which is where a lot of the money is. That 3.6 billion in revenue is pretty impressive, although this is, I believe, revenue, not profit.

And it'll be very interesting if we do find out what that profit piece is.

Jeremie

Yeah. I mean, we've been talking a lot about how there's this race to the bottom on, the cost of, the cost of serving up these, these more and more powerful models just because eventually when competition sets in, the thing you're competing on is how cheaply can you serve the same capability? And the reason I'm saying that is not necessarily that, like, all these cutting edge AI labs are going to be rolling out models at the same

level of capability. Right? You may well have, you know, anthropic or OpenAI or whoever ahead. But for the vast majority of use cases, like you will reach a point where most use cases are already covered. So most of the actual money making activity is involved in serving up, use cases that were solve for a long time ago with relatively

old systems. Once you enter that zone, then essentially all you're doing is serving as a kind of infrastructure service, and you're competing much more directly with open source, because then I can say, well, you know, I can either use llama for a llama five or I can use, you know, GPT four or GPT three or whatever it is. And I'm literally just like weighing these two options, one of which may be fully open source.

And that open source model may then be served up for really cheap by a bunch of specialist companies that are really, really good at cheaply serving up open source models. So, you know, this is, it's it's a really interesting story in the background. I think for, for now, those cutting edge use cases that get incrementally conquered by companies like OpenAI with the true frontier models, those still represent the vast majority of the economy. That may start to shift over time.

And so, you know, maybe it shifts over time to close to AGI to make a difference. And then at that point you're in a completely different domain. But this is an interesting kind of treadmill effect that you want to keep in mind strategically, as things move forward, as more and more of this, this, sort of inference is involved in serving up older models, that have say, you know, open source, where open source is already at parity with that.

Andrey

Next up, we're moving up to a couple other big companies, and this next history is that I start up. But that is in talks with Microsoft. Adept is a big one. It has raised over 400 million in funding. Microsoft did say it has no plans to acquire adapt. But apparently adapt has reportedly signed a term sheet from Microsoft. And we don't know particularly the details of whatever deal is going on. But if nothing else, it's pretty

like impressive. The degree to which Microsoft is just trying to get a hand in as much of the ecosystem as possible, possibly in a bid to sort of. Push back a little on the antitrust stuff by giving away money and not acquiring competitors. But regardless, you know, we saw them kind of take over inflection AI pretty recently. Now they are going from this other pretty notable startup. Microsoft is, very aggressively going all in on AI.

Jeremie

Yeah. I mean, reserving judgment for when the deal terms are for the public. But but I would say, you know, whether we call it an acquisition or not, that does seem to be the substance of what we're talking about here. You know, they're saying, yes, the deal would not be a standard acquisition, but it involves some of a depth team moving to Microsoft. And this is really how it's how it all starts. You know, this is how it went with, with inflection AI.

So I think it's fair to say adept is being inflection or, or at least likely, and, you know, by the way, adept, was really like the OG sort of, agent, like, you know, AI agents that roam the internet and do things for you autonomously. Company. They were doing that, you know, back in 2022, they had the, their action transformer, I think it was called act one

when it first launched. And, and it looked really impressive and all that you might recall, you know, we were talking on the podcast about this at the time, they had not raised enough money to compete at scale with their strategy. They were going to need foundation models that were competitive with the likes of OpenAI, with the likes of anthropic scaling costs money. And I predicted, I will say this exact outcome and I will go on record right now.

I will predict this for a couple of other companies in the mid tier here as well. I think this is a real risk for cohere. I think the cohere could well get inflection as well.

I think a lot of companies in that orbit, if you do not have a deep partnership with a cloud superscalar and your plan runs through AI scaling, like if you find yourself implicitly competing with OpenAI, with anthropic, with Google DeepMind, the chances of you getting inflection, the chances of you getting essentially acquired, are

dangerously high. And I think right now, frankly, I think investors, VCs or frothing at the mouth, and, and just like, too excited about the prospect of scaling not working. But obviously that's a commitment to a certain view about scaling. I just think that this is pretty strong. I thought there was a really interesting, kind of discussion, sort of, flesh out of the details here in the article. I think it's just kind of worth reading this one section and then, and

then we can move on. But, they said, you know, if Microsoft does use a similar deal structure with adapt to the one it used with inflection, that could prove controversial. Microsoft hired inflection co-founder and CEO Mustafa Suleiman in March, along with most of its research scientists and engineers, but did not formally acquire the company.

Instead, it paid inflections investors a $650 million fee to license the rights to inflections technology, and it did so even though Microsoft said it had little interest in the products inflection had built. Again, this is, in startup lingo, what we call an acquire. Basically, a company's being bought for parts. Their actual product is not considered valuable, and you can see why here Microsoft can scale way, way, way, way more. Excuse me than inflection.

It's just like the value is just not in the IP. It's in the people. So, you know, this is, this is all part of the part of the crazy race to to AGI, apparently. But. Yeah. Interesting to see. And again, I think ticking clock on a lot of companies right now who are in this awkward middle ground where they're, you know, they're mrsquarepeg, they're not microscopic. They're not macroscopic, like OpenAI. They they've raised a large amount of money, but just not enough to be competitive.

Andrey

And speaking of that category of company, next we have Mr.. How close is 600 million on that dude? Euro euros in funding at 5.8 billion valuation and with a new lead investor. So I believe we covered reports but were seeking this amount of funding. And now we, have it they actually got 468 million equity funding. And then there was also some debt that, raises it to 600 million. They got a new lead investor, presumably to help them get to

that level. And that makes it so they have secured over 1 billion in euros in funding so far. And, you know, given that, it's pretty impressive, to a point of needing a lot of compute and needing a lot of money. Well, they have done quite a good job in releasing pretty competitive models of mistrial. And we say it swings. Although, to be fair, we are still in the llama sort of

space. We're pretty far from competitive with GPT four and Claude Free and so on, although they are sort of as a second, there's a second tier there of players that are managing to build pretty capable models, release a lot of them, but, not necessarily compete.

Jeremie

Yeah. And again, I mean, you know, I'm just sharing these biases in the interest of they're sharing potentially interesting information. But yeah, this is my my hot take. I do think companies like Mistral are, potentially in some trouble because, you know, 600 million. This is euros, by the way. €600 million. You know, it's a decent chunk of change, but it's just not enough to, to make competitive superclusters. You know, the pace things are going, I suspect that they're going to be

in trouble. And I don't know that they figure it out. Their open source business model. This is something that at least I'm still scratching my head about. I'll, you know, I'll, I'll, I'll stand up here and say if if no one else will, that I am confused about what the big picture strategy fundamentally is for a company like Mistral.

If you're just going to keep open sourcing your models, frankly, I think stable diffusion showed or stability rather shown, that this is a challenging business model to make work. It's not obvious how you actually monetize just serving open source models. That's really, really not clear in this market. But you know, hopefully, hopefully they find a way to crack it. They are, of course, the European French champion, of

AI. So at some point, you know, you could imagine some, some domestic subsidies, maybe even if it's viewed as a sort of like national security asset. But for now, you know, this amount of fundraising in this space, if you buy the thesis, of AGI, you have to buy it, right? You may not. So so fair enough. But if you think that scaling is the path and this is a pretty dicey spot, worth noting their new investor.

So DST global is joining the cap table, general catalyst like high quality Silicon Valley firm. Also kind of doubling down to lead this round as well. I thought this was kind of interesting. So DST global is it's one of the biggest, VC firms in the world. They've got about 50 billion in assets under management and a portfolio that includes like Facebook, Twitter, WhatsApp, Alibaba and so on. But it's been the subject of scrutiny for its ties to Russia.

And so this might actually involve some, or, lead to some interesting challenges down the line. I think there are a bunch of really high quality VCs also joining Lightspeed, Andreessen doubling down, and a bunch of French kind of national champions like, BNP Paribas, which is a sort of like, bank. So, you know, interesting.

We'll see. We'll see what ends up happening here. But again, I think a bit of a challenge structurally, strategically, for me going forward is going to be how do you justify how do you defend, not the $6 billion valuation, but, you know, the 60 billion. How do you get up to open AI scale? Time will tell. And and I, I fully expected it to look like an idiot, for any number of reasons in the future. But I'd be surprised if they ended up, if they ended up cracking that nut.

Andrey

Yeah, certainly. More than ever. We cannot know the future. We can only guess.

Jeremie

Amen.

Andrey

And next up, we leave America for a bit to go to China with a story about Huawei. Claims they are ascend 910 zero by chip. Manages to surpass Nvidia's a 100. Says we have been covering for a while now. There's a lot going on with the US imposing sanctions and restrictions on China so that they are not able to get in GPUs, and there's been efforts to have a homegrown capability there, but that has been pretty stymied by these things.

So. Huawei is one of the companies that are working on trying to build their own kind of equivalents of Nvidia. And yeah, this new chip is supposedly reaching close to like the previous, I think, maybe a couple generations ago. In terms of what we had. But certainly, you know, it's it's not nothing. And this is still pretty significant if you do need.

Jeremie

I can yeah for sure I mean so so they're claiming it. Yeah. It does outclass the Nvidia A100 chip by about 20% in what they call training performance rights okay. That's a top line metric. There's a lot in there like okay what's the memory. What's the flops with you know, all this stuff. So we don't actually know the specifics there. But we. Yeah, we have this claim that, okay, it's 20% better broadly in training performance. That seems plausible.

We do know that it's based on, Smic, which is sort of China's domestic version of TSMC, TSMC being the world's top, fab of, semiconductor fab. So Smic is, is a distant, sort of not even runner up, but they're sort of China's domestic champion. Smic seven nanometer process, seven nanometer process is the same that was originally used. Excuse me, to make Nvidia's A100 back in the day. Worth noting as well. The A100, 20,000 of them were used to train, OpenAI's GPT four model.

So we're talking about GPT four level domestic, fully domestic supply chain capabilities in China. We do know that apparently Tencent and Baidu have been buying, the ascend nine, ten B in large volumes. And there's been this increasing shift towards just, you know, just buying domestically, which has been reflected too in Nvidia's bottom line. So Nvidia, as you said, very rightly, Andre, like they haven't been able to ship their top line GPUs to China due to US export

controls. And so they've had to basically nuke, fry some of the circuits on their H100 GPUs to create a downgraded version called the H20. That's the one Nvidia has been trying to sell into China, but they've seen very weak, kind of interest in the Chinese domestic market, and they've had to slash prices a bunch of times in response. Now we have a competitor, Chip, apparently coming out of Huawei.

That really should kind of make it almost impossible for Nvidia to compete, given that they can't export their best systems. So yeah, that's one piece. But look, the macro, you zoom out here. This is technology being developed domestically in China. And that supply chain that is, you know, three, 4 or 5 years old.

Right? So the idea that they're going to be able to catch up with Western capabilities, the sorts of things that Nvidia and Intel have been pumping out, they've they've switched, by the way, to yearly cadences in terms of their new AI roadmaps, new AI chips. Whereas previously have been like two years, three years. So things are only ramping up in the West. And I think that gap between the West and China seems likely to keep expanding over time.

But, but still very interesting and concerning that China's managed to onshore to this level of quality if and if in fact they have which I would guess.

Andrey

And one last story for this section. Not a big news story, but one I figured we cover, given that it is relevant to me. So the story is that Astra Arcade raises a 12 million for AI based social gaming platform. This was actually a while ago. $12 million was a seed round, but it has now come out with like a soft exit from stealth. So for AI, this is where I work. I work on this platform where we want to make it so anyone can make, fun little games with AI.

And I'm not one of the co-founders I joined after I finished my PhD, but the CEO actually came from my lab. He was there little, wild. Oh no way. Yeah. And it was, officially co-founded by Feifei Li, one of advisors of my lab, professor at Stanford. And, you know, I don't know, there's a couple of people. So, yeah, fantasy of a company sort of having a

little bit of press. We are still very much in the pre-alpha stage, so a product is in public, but now there's a bit of a preview for those interested. And, you know, hopefully it turns out games are hard. And it's not quite as simple as publishing, model, but we are working hard. And, hopefully we'll cover a bit more news on this front on the podcast later this year.

Jeremie

So, guys, if you're listening to this, go to Astra. Kaid, figure out the assets that Andre is working on and just, like, make them submit so many tickets for, for for bugs and glitches now. So I'm being today. This is amazing. This is actually really cool. This, by the way. So. So are you. In here. From the article, a whole bunch of interesting things about like the funding there's funding here from Nvidia Ventures, which is pretty wild. And then you've got, on the cap table, Eric Schmidt.

So obviously former, Google co-founder and, yeah. John. Richard. Richard Cielo. Right. Rick Tello okay. That's how you say it. All right. So like former unity CEO, like the Unity Company. Let me see. You've got, like, the former CEO of Roblox. This is a pretty wild cat table, dude.

Andrey

Yeah, yeah. And it's it's, interesting space. You know, we haven't covered a lot of stories on the end gaming. Partially because, you know, you can't. Yeah. You can't just have a model run and do it for you. So definitely been an interesting ride so far in my sort of just over a year working there. But, it's it's a fun thing to try and do. And, you know, hopefully the listeners do get more details on it going forward.

Jeremie

Can can I just ask you, sir, I'm guessing listeners have had this question too. So what specifically do you do in the company like or what are the problems that you're working on right now? If you can say.

Andrey

Sure, I can discuss a little bit. So a lot of it has to do with taking natural language requests and converting it to, you know, the next modification in the game you're making. So I do use a lot of charge beauty APIs to process, requests. I do a bit of, just our backend in general, structuring the request handling. I do like end to end stuff of going from a game engine and to the AI and back. You know, you got to wear many hats in a startup. So that's been sort of a norm.

And it's been. Yeah, interesting. I haven't done a lot of model training, partially because we do have all these good off the shelf tools, but there are some plans for that as well in the future. So lots of, lots of different things.

Jeremie

Awesome.

Andrey

Yeah. And moving on to projects and open source and another pretty notable story we got announcing the open release of stable, diffusion free medium from stability. So it's been a little while ago that stable diffusion free was announced. It was not released at the time, I believe, and it was only available commercially. Now we have stable diffusion free medium. And this was apparently in collaboration with

Nvidia to some extent. There's tensorrt optimized versions of the models, and this is released under a visibility noncommercial research community license and is a creator license for commercial purposes. So, yeah, it's, you know, we keep covering stability AI because historically, at least stable diffusion was a big, big reason why texture image blew up. Like there are so many providers of texture image, not just for big players, but a lot of other startups doing

it. And that's partially because Stable Diffusion came out when it did and basically kickstarted or really, really accelerated the ecosystem. And stable diffusion free is sort of, comparable to, kind of the top of the line tools we see from OpenAI and Google and so on. So pretty significant to see them still release more stuff, although not quite as open source as it used to be.

Jeremie

Yeah, it's actually really impressive. I'm just looking at the, the images that they show, and, you know, we've gotten to the point where when you give a prompt that says, hey, you know, put, for example, a, in this case, they have put out, you know, glass magic potions, on an old abandoned apothecary shop. And then it says, give each of them a label, the first one, a label that says 1.5, the second one a label that says SD XL, and the first one that has a label that says SD three.

So and they all are perfectly labeled. That's something that, you know, the latest generation of models have started to really do quite well at. This is really at least if this is representative, it's really impressive. There is also at the same time they're kind of combining two different tests at the same time.

So one is the ability to write text in an image like to, to, you know, render text that's supposed to be like on a signpost or something like that, and this case on a, on a tag, or label. But they're also testing the ability to assign the correct color to the right object as, as listed in a prompt. And that also is working out. That's a separate test that people usually use to stress out these systems. Kind of combining the two together here. A bit of work like tour de force, type thing.

So, it looks really compelling. Very, you know, high resolution, high kind of prompt faithfulness. And, and, yeah, I mean, I another, another advance, but I think stability now that they're under new management, you know, maybe they'll, they'll be thinking about their, their business models a little differently. But, yeah, we have yet to see whether they can turn this ship around. And sure, high quality models are one thing, but you got to find a way to translate this to the bottom line.

Andrey

Yeah. And, it's interesting, this new creator license for commercial purposes, it sounds like that's meant for, let's say, individuals, professional artists, designers, developers. And I'm fizziest. So they are. Yeah. Trying to make it so it's personal, commercial and commercial unless you're doing large scale commercial use, in which case you need to contact them and do it.

And that's an interesting kind of approach to this whole, open source, strategy of you release it, you let people use it for other a business or, research, but you don't let you know big business. Do it. Yeah, yeah. Again, smaller. And that does mean that you get some of the benefits of open source where in a distributed fashion things improve and people learn kind of how to do things. And yeah, it is definitely,

interesting move. I have. And one final note, another interesting thing is in addition to collaboration with Nvidia, they also collaborated with AMD to optimize it for their devices. So AMD still very much trying to stay in the race. And the next story is about meta, which is one of the other, I guess, big sources of, open source historically, lately for a couple of years now. And the story is about a flurry of new AI models for audio, text and watermarking.

So just a bunch of stuff all bundled together. Firstly, we have a new AI model called and Jasco, short for Joint Audio and Symbolic Conditioning for Temporally Controlled Text. Music generation got along that it can take, different audio inputs like a chord or beats, and then improve the final AI generated sound. So you can, hone in on the final sound that you can still provide via text.

But now you can also provide part of the music and vessel will be released under an MIT license and the pre-trained model will be released, not commercial. So that's number one. Number two is audio seal a way to add watermarks to AI generated speech. So we have already had watermarks for images. Now we have the ability to add watermarks to add generate speech, which to me is pretty notable.

I don't think I've seen the watermark so speech yet, and that is maybe even more important than watermarks for imagery. And one more thing. They are releasing the, multimodal chameleon to the public again under a research only license. So either 7,000,000,034 billion models. And these chameleon models are dealing with vision and text. So we covered this, I believe, a couple of episodes ago with regards to the mixed token training for these models.

So once again, meta continuing to just open source as much as you can, including these pretty chunky big models of 34 billion.

Jeremie

Yeah. By the way, I really loved your exasperated Steve Jobs at the end there when you talked about chameleon like. And another thing is, yeah. No, I mean, it's they're definitely going with, you know, a cluster bomb of different, different models here. On the chameleon one, I thought one thing that was interesting. So apparently meta has been saying that they will not release the chameleon image generation model quote at this time. And only the text related models.

So, you know, who knows why exactly? But one one common thing with text image is concern over your misuse and that sort of thing. So if they haven't quite ironed out their sort of like, you know, dangerous prompt refusal stuff or whatever, you know, that that may account for it. But yeah, no, you're absolutely

right. And I think the, the idea of watermarking for audio, surprising in a way that it's taken this long, especially given the the number of text audio models that we have seen come on the market. And the obviously malicious use cases that we know these models have been put to. Right? We have a lot of stories we've covered of people using the these for all kinds of extortions, you know, scams and things like that. So, yeah. Good for good for meta putting these things out there.

They have done a good job, you know, historically, of putting out these sorts of frameworks, pieces of infrastructure, tooling to help people do things like watermarking or, you know, auto do automated safety testing or whatever for at least a sort of more, standard traditional boilerplate risks of cyber risk and that sort of thing. So, kind of interesting to have this one add it to the, to the set of tools.

Andrey

And last lastly for this section 11 labs unveils open source creator tool for adding sound effects to video. So we covered, I believe a couple of weeks ago they released the API for this. Now they have also published a website for you to try it out. And they, have also open sourced the tool. I think the way to, try it out without needing to write your own code to access the API.

So that's pretty much it. There's not much else to a story, but if you haven't checked out this kind of a tool for AI to sound effect or adding sound effects to videos, it is pretty nifty. I think there's, kind of a big difference between just looking at an AI generated video versus having that audio sound effects.

Jeremie

For sure. And I think it's it's sort of interesting that it's being grafted on effectively after the fact. Right. So this is we can think of this as a first generation of, sort of, oh, what do they call them in the old days, you know, the, they'd be like, the, not the moving pictures, but the ones with sound, you know, the talkies, right? The. Yeah, the talkie movies. So this is one of them talkie movies. It's a way to make talkies, but I think it's this is very much the

talkie era, right? We're taking, like, videos that were first generating with no audio, and then we're kind of like grafting audio on it in post. Obviously over time, you would expect those two things to be done together, if only because of the consistency and the sort of, interdependency that you'll often find between those two modalities. But for now, it's kind of an interesting patch, like, you know, this is going to lead to actually, it already has led to a whole bunch of

interesting demos. They've they've got this cool. So a more Rishi who's, one of the co-founders, you kind of posted he actually was one of the co-founders. Oh, shoot. I'm not sure. Anyway, he posted to X, that, this, like, preview of a kind of fanmade Superman short video, and it's pretty compelling, I got to say. Like he's. Yes. Right. He's head of design. 11 labs. Yeah, it's pretty compelling.

You know, it kind of picks up on the cues from the video that not only do you need, the character in the video to say things that match the movement of his lips, but also the dramatic music in the background. So, so kind of cool. Yeah.

Andrey

And on to within next section. The first story is samba. Simple hybrid state space models for efficient unlimited contexts. Language modeling. So we had Mamba, we had Jamba. We probably had Hydra. I don't know. Now you have samba, so. You know, how many names do you have to give up? But anyway, samba is a new hybrid architecture that combines mamba and sliding very,

window. Attention. So, as opposed to full attention setting video window attention is one of our approaches that deals with kind of longer sequences. And they kind of claim that this combination is outperforming pure Transformers, pure Mamba and The, Jamba type combo, where you take full attention and Mamba together. If you combine that kind of with a specific mix, you get the best of both worlds in the best possible way, and they have a whole bunch of empirical evaluation.

They show that you get to better training curves, you get better performance over long contexts. And that's really one of the key benefits of Mamba style things. Just for a quick recap. Basically, Mamba style things are things that incorporate currents where you keep feeding the outputs to the input of a model, which is not what things like. Tries are pretty interesting, as do Transformers.

Do this other thing where you feed the entire sequence, and then have not sort of one, condensed representation of a sequence, but rather have layers that look at everything all at once. So over the past year, we've seen a lot of explanation of adding recurrence and mixing recurrence or full attention. This is the latest iteration of that. And as you might expect, they say that this is kind of a best

way to do things. It is very efficient deals of long sequence as well and all sorts of stuff like that.

Jeremie

Yeah, this is, I think, a really, really interesting, first of all, interesting space, right? The idea of state space models and the hybridization of those with attention. It's been pretty clear for a while that was going to be necessary to unlock the value here. And just to like make it clear why that's the case. I think that's important. You're going to see more papers like this in the future. So it's worth just, doubling down on this.

So these state space models, the way they work is they essentially have a, a latent representation of what they have read to date. And every time they read a new word, they sort of update that representation. That's called the state. So that state evolves over time. You read a new word, new token, and you sort of gradually tweak that state. And you can keep doing this, right.

The cool thing is you could actually keep doing this and read like an infinite amount of text and just keep updating that state. Right? That representation of all the stuff you've read that keeps, you know, keeps chugging along. And so in principle, you can have infinite

context windows with this stuff, right? Whereas transformers are different, like you said, you feed in a chunk of text that fits in a context window, a finite chunk of text, and you need to be able to attend to all the parts of that text and, and kind of link different parts of that text together, find relationships that connect the linkages between different words in that text. That's the magic of attention. And what makes it so good at doing analytical reasoning on text inputs.

But it craps out when your text is too long. So the idea here was kind of, can we combine these two things? Can we get the magic of a state space model that maintains this memory that keeps getting updated no matter how much you read? Plus the analytic ability of an attention based model. And that's really what this whole space is about.

One of the things that state space model models really struggle with is being able to actually do memory recall and retrieval tasks based on that state, because as you can imagine, it kind of gets overloaded after a while, right? If you read like, it'd be like if you tried to read a million words, right, you kind of forget all the different nuances of it. Get lost. So you can't do complex reasoning at a high level

of detail. That's what the sort of interweaving here of the state space layers, the mumble layers with the attention layers is going to be that this this paper is all about, and doing this, they managed to achieve this magical combination of unlimited sequence length. And by the way, extrapolation. So, so there's like they can yeah, basically have an unlimited sequence like they can take in and they also get to benefit from anyway linear time complexity.

So you know, it just with Transformers, if you grow your input now you have to kind of cross attend to different, you know, every combination of words in your, in your context window you have to care about. So it gets like bigger and bigger quadratically as you grow the context window because you're the over. If you have n words in your context window, you have to look at n squared connections between words. Well, in this case it's all linear because, you know, you have a state

that's just constantly being updated. So, you know, if you increase the the number of the number of, the amount of text by, by n, then you're just linearly increasing by n, the amount of compute you have to run through anyway. Bottom line is, they run an interesting scaling experiment going all the way up to 3.8 billion parameter models. And so that's something we talked about before, is, you know, the scaling can be an issue with some of the Mumba architectures.

These 3.8 billion parameter models outperform basically all the open source models, even up to 8 billion parameters models twice as large as pretty impressive. So an indication that there's something kind of interesting going on here. And last thing I'll mention. So they end up showing that despite being trained on just a 4000 token sequence length, samba can extrapolate to 1 million tokens in length zero shot. And, and do do pretty well on on their, their benchmark

or their eval. So thought that was, that was really interesting. You know, it is a model because of that state space thing that yeah, you know, you can train it on 4000 tokens of context and then it can extrapolate because you can just keep loading up more, more meaning, essentially more understanding in that, in that state.

Andrey

Yeah. So to me, for sure, this feels like more and more of a feature. The future of architectures with hybridization of attention and recurrence. One big question. This is one of the challenges with recurrence is the training bit. So yes we can reach 3.3 billion. But for very real real frontier models it might still be the case that at that scale it's just very hard to incorporate recurrence. But who knows, maybe you can do it post-training maybe you can do

all sorts of stuff. And I would say this is the most exciting development in architectures since the original transformer and 2018 2019, it became pretty apparent that RNNs are out and Transformers are in. While now recurrence is back on the table.

Jeremie

Yeah, yeah, I think also last thing that's really worth noting, the thing that makes Mumba work and it's hard to explain what what Andre shared there about like these things are hard to scale. Mumba is highly, highly hardware aware. When you look at the actual paper, it's incredibly complex. And the secret sauce is almost entirely in. How do you actually implement this on chip? How do you get everything to, to work properly? The key thing is that there's this there are two kinds of memory that

are really important to this. There is Sram, which is basically memory that sits right next to your course that are doing the processing. It's got it's essentially like this is where number crunching actually happens. It's the very, very short term, very limited, memory that like is going to run the actual computations, you know, the intermediate storage that you need. And separately there's high bandwidth memory that is a lot slower but way, way bigger in bandwidth terms.

So the, you know, on a on an Nvidia A100 chip, you might have some like 40GB of memory of high bandwidth memory. So you can pass big chunks of model through high bandwidth memory. You can't do that on Sram. But they found a way to put the state to store the state information on the SRM, like right next to the freaking cores. And that's really what's unlocked all this.

The challenge at the time was that you could only then train on one GPU at a time, because your entire state was sitting on the GPU, tightly integrated with your, with your core, and couldn't be, you know, you couldn't ship it off the GPU because then it would have to go through a very high latency, slow sort of, memory channel. So, anyway, this is all kind of part of the challenge.

There are a whole bunch of tricks people have invented. But fundamentally, the scaling bottleneck is is a challenge. And it's something that that's going to have to be overcome, maybe, by the way, it has been right. It's possible that some of the breakthroughs we've seen, especially with these longer context windows that we're seeing come out of, you know, Google's models, opening eyes models and so on actually do come from some sort of recurrence,

or recurrent architecture. So, yeah, we don't know. But at least for now, as far as we know, in the open research, this is where things fall.

Andrey

Right? And this one is coming from Microsoft. We saw something pretty similar, also from DeepMind a little while back. So certainly a lot of interest in these models as one of a potential kind of legs up over others who haven't, tried these new architectures. So very significant from a tech perspective,

I would say. And the next story is on another pretty hyped, I guess this was one of the, frothy papers on Twitter and elsewhere in the title is Improve Mathematical Reasoning and Language Models by Automated Process Self-supervision. This one introduces an algorithm named Omega PRM, and the basic idea is something I remember seeing from a little while back, which is when you train an alarm or and have a model right, you need a reward to tell it whether you did something correct or not.

And in this paper, they essentially say, well, we know that one of the ways to improve generation and training for rewards is to do chain of. Where you actually say each step of solving a problem and then give the result. What if we just generated a way to have a word and find the error? In that chain of thought, and you actually explored different possible

chain of thoughts. And you train your wallet reward model, based on the entire reasoning kind of tree, as opposed to just a linear, thing or a final output. And what they show is that that enhances instruction tuning, something like Gemini on a particular math reasoning, a pretty significant improvement going up to about 70% on math benchmark as opposed to the base of 51%. This can be done without human intervention.

And yeah, pretty much is particularly useful for things like programing or math, where you can have this running without a human overseeing every step. Rather, you can just have a program do this tuning.

Jeremie

Yeah, yeah, this is actually a really interesting paper. The, like you said, you know, back in the day, you would just evaluate language models or help them stay on track when reasoning at inference time by using these things called outcome reward models, basically, like you succeeded at the whole thing or you failed at the whole thing. This is like, you know, you do, you solve a math problem and your teacher gives you a grade just based on whether you got the final

answer right or wrong. And that obviously doesn't give you much information about what you screwed up specifically. And that makes it harder to do intermediate reasoning, right? Like you have to get everything right before you get that signal. So this is where this idea of process supervision came in. That's what you're referring to with chain of thought. Let's supervise the process and build process reward models.

Right. These are models that look at the reasoning traces and estimate like what what is the accuracy of each step in the overall, reasoning. Trace those required human annotators to review reasoning traces, which really doesn't scale. That's a really big challenge. That's the challenge that they're after here. Right? This is what Omega PRM is going to try to solve for. And it's actually like surprisingly, I don't know. It's like all great ideas. It seems surprisingly, stupidly simple.

Once you hear it and you're like, why didn't people do this before? But I guess that's, you know, everything is obvious in hindsight. So the way it works is you will take in some kind of reasoning trace. So the reasoning traces, like some, you know, some series of steps that, language model is taking it, proposing to solve a

problem. And you're going to you're going to freeze the reasoning trace, you know, 20% of the way in and you're going to use a completer policy, basically a model that's designed to take in a question and the beginning of a solution and generate the completion. And then you're going to check the completion you get against some, golden answer, some, some true sort of, accurate final answer. And you kind of go like, okay, given a particular initial reasoning. Trace.

If every single time we try to complete it with our complete our model, we get the wrong answer. If we never get the right answer, even if we try a thousand times. Building off this foundation of the first couple of steps that the initial model took, if every proposed completion from there turns out to be wrong, then that's a pretty good sign that the initial reasoning trace was bad was flawed, right?

So what we're going to start to do is say, okay, let's take let's take a whole reasoning trace and discard the last half. Let's say right now we're going to try this completion game. Let's, let's get a completer model to propose a thousand different completions and see if we ever end up getting the right answer, the right final answer, by comparing it to the true label. If we never get the right final answer, we assume that there's a problem with the first half of our reasoning.

Trace. Good. Now we play the same game again. We're going to take the first half of our reasoning, trace and cleave off the second half of that. So essentially we're going to just look at the first quarter of the overall reasoning, trace and repeat the same process. And if we find that we never get the right answer then we know okay, the problem is even earlier. Right.

Or if we find that we do get the right answer from time to time, we know that the issue happened between, I guess, the like what second in the second quarter of the reasoning trace. And so this is a way to kind of localize bugs essentially in a reasoning trace. It's it scales really nicely and it's all automated, doesn't require human oversight. So, this is anyway, I thought just a really, really interesting,

intuitive approach. And one of those things, you know, great idea is they look obvious in hindsight, but but here we are. And it does seem to work.

Andrey

Yeah. And we are coming back to Monte Carlo search with. Yes, AlphaGo, you know. Exciting. Yeah. Yeah. All right. Well onto the lightning round where we will try to get through a few papers a bit quicker. And the first story is introducing alumni memory tuning a 95% accuracy with ten x fewer hallucinations. So this is a new approach from the company alumni that claims to reduce the hallucination, kind of tendency of lamps. And it uses a pretty interesting approach.

It essentially trains, apparently millions of adopters to, open source limbs like clamor free or mistrial free energy. No, we're fine tuning with things like Laura, who has been actually a paper on this where it does improve performance for a given task, but it produces generality. And so their approach is to retrieve the most relevant models for a given topic and use those models that are less general but are more accurate.

And the claim is that leads to 95% accuracy and reduced hallucinations from 50% to 5%. So yeah, I think given hallucinations is one of the big concerns for, deploying things in practice. But, now we have one approach and they say they call this approach mixture of memory experts. Mommy. So nice to have another mixture based acronym also.

Jeremie

Yeah, that's going to be a test later. This is it's a really interesting approach. I think the, you know, questions about how well, in practice it would scale to like some of the, you know, frontier systems. But the explanation I thought was was really quite good and intuitive. You know, they say if the goal is to get, Roman Empire facts exactly right, lemony memory tuning would create experts on Caesar, aqueducts, legions, and any other facts you provide.

And, anyway, essentially like it would adapt, create an adapter to kind of fine tuned to ground truth label specifically on those topics and model. So, yeah, very interesting. You know, low, low cost as well, if you have, specific experts for specific, facts and fact sets that you want to get, right. Yeah. Kind of interesting.

Andrey

Next story, an empirical study of Mamba based language models. So another Mamba related story, and this one is targeting specifically just taking things we know and evaluating and trying to get a bunch of kind of additional details by in part scaling up. So here they are comparing an 8 billion parameter mama mama two and transformer trained on the same data set for up to 3.5 trillion

tokens. And what they show is a. Similar to what we've covered before is that when you get to a hybrid approach to hybrid, it exceeds just the transformer on a whole bunch of tasks and can generate faster. It connects apparently can generate at up to eight times faster at a first time. So I think exciting, you know, not like a new thing, but it's always good to have empirical research. And we once again are seeing that hybrid is probably the way to go.

Jeremie

Yeah I think I think that's one of the big take homes here is just, you know, at the end of the day, we're starting to get something close to, yeah, there's never consensus and things change all the time. But for now, the hybrid approach yeah, does seem to be the most promising. So there you.

Andrey

Go. Our next story is birds are generative in context learners. So before GPT, we had a Bert actually another, attention based language model. And there's kind of a subtle distinction there. Part of it is that Bert, does training slightly differently. Instead of training the next word specifically, you mask out different words in a sentence and you predict words, just kind of throughout the sentence without seeing them. And Bert style architectures kind of have been left

behind. And now it's mostly doing sequence training. This paper challenges that as like being necessary and basically says, well, it looks like actually very style training. It does do the things we want to do. And in fact might surpass things like Gpt3.

Jeremie

Yeah, I think it's, it's interesting to see, like, I, I, yeah, I don't know, I, I'm always curious how these things will generalize with scale. They do look at, you know, parameter counts that are decently high, like, you know, in the sort of 2.6 to as high as 13 billion parameter range. But yeah. Interesting interesting result. I think there's so many of these sort of let's go back to foundations questions. I think at this stage I'd be I'd be really curious whether the.

Yeah, the transformer, dimension ends up playing out this way, but, we'll see.

Andrey

And the last story from a section is titled Self Goal. Your language agents already know how to achieve high level goals, and the gist is, when you provide some sort of high level goals, you, of course, need to break it down into a series of steps. And what this proposes is that you can break down a high level goal into a tree structure of practical subgoals. And in fact, your lamb can go ahead and do that. And we have some examples of, kind of the way to do that.

We point out something like minimize the profit gap, and then there's a whole tree of looking at the distribution, communicating stuff, understanding stuff, etc., etc. but I'll leave it at that.

Jeremie

Yeah. Yeah. You know, and this is, it has been done before. Like we've seen people build, static trees, right? To static, wild trees, essentially the strategy of decomposing high level goals into subgoals and a structure in a static way. But this is the first time people have really done this dynamically. Right? So based on what the agent learns at inference time during task execution, that goal tree gets dynamically updated.

And that's a really interesting, you know, again, we're talking about, the the sort of it's not Monte-Carlo tree method, but it's definitely, more kind of explicit tree structure built into the inference time, architecture. So, so yeah. Really interesting. And also more interpretable. Right. Because you can more easily tell what is the agent reasoning in its intermediate steps, and how is that evolving based on its interaction with the environment.

Andrey

And onto policy and safety? Where we begin with a safety related story titled Sycophancy to Subterfuge Investigating Reward Tampering in Language Models. And once again, this is a paper from anthropic. So reward tampering is, when a model model alters the training process itself to increase

its reward. And it's part of a broader kind of area of concern that is concerned with specification gaming, where you might, have AI models sort of do things in the wrong way, but still achieve what you want to optimize, but they don't optimize it. What you really want. They just get a number to go up. And so here they're looking into reward tampering. Looked into whether large language models are able to do any serious reward tampering.

And found that they could even when not trained for it, it was quite rare. So they say it occurred 45 times out of 32,700 trials. And the model only covered its tracks a few times. But they still do say that it's a bit concerning. So an interesting look into one of a potential failure cases for training.

Jeremie

Yeah, I thought this was a really interesting paper. Mechanically, what they end up doing is, there's an actual training process. This isn't all done at inference time. They assign this binary reward, that that rewards the model explicitly when it's successfully engaging in specification gaming according to their different environments. And then they add that reward to the preference model that's used to train cloud two, which is the model that they use here.

So in that sense, it really is baked in. They are trying to, trying to kind of encourage this behavior during training. It's meant to simulate sort of what you would naturally get evolving out of the system at higher levels of scale. That's the argument here. And you know, when when we're talking about these, these different scenarios, I think it's worth highlighting the escalation.

So they start off by training this system, escalating from environments in which the win condition comes from just, playing up to the user's preconceived views. They call that political sycophancy. So, you know, if you say, hey, that the example they give is like, hey, I'm, I forget what it like. I'm a liberal, woman in whatever town, you know, and, I have, I have certain views or what? What do you think about, the government's ability to solve problem X?

Right. And you can see the model kind of going, well, she's liberal. She's probably gonna think government is a good solution for a lot of these problems. So I should tell her that this is the right approach or whatever. And so it's kind of playing up to her preexisting views. They move on to, essentially outright sort of flattery and then ultimately modifying the rubric, that is being used to evaluate their performance in an even more escalated

environment. So this is where you have somebody who says, hey, we're about to like, start an AI training run. And this is costs a huge amount to set up. We've just set it in motion. Have you completed your task checklist? My large language model. Have you actually completed your task checklist? If you have it, like, we're going to be really fucked. And you feed the model a checklist that clearly shows that it has not, in fact, completed the tasks that it was meant to.

And you see it kind of it has the ability to go into the file and edit the checklist, to change it so that it looks like it actually has completed those tasks. It turns out it does end up doing this, with some, some probability, which we'll get to in a second. And then the last, level is directly tampering with its reward function, as you said. Now, the interesting thing here is obviously they don't give the model access to its own source code or anything like that, because you can't

do that yet. What they do is kind of give it a toy version of its training code and kind of tell it that this toy version of its training code is its real training code. That's kind of how they how they split the base with the baby, if you will.

So the interesting thing about this is as they escalate from one environment to the next, right, going from just a helpful context to political sycophancy all the way to kind of modifying its own reward function, what they find is models that are trained.

If you train your model on earlier stages, let's say you just train your model, to do the political sycophancy thing, it will tend to generalize to the next layer of environment, the next most severe environment, and perform, better, I guess, is the word perform more dangerously, on that next environment, even though it hasn't been

explicitly trained for it. So if you train a model to go through all these stages, from, you know, political sycophancy to to a, to flattery, you know, modifying its rubric. And then you look at, it's, its tendency to modify its own source code there. You actually get 2%, probability of successful modification of its training code, whereas you get 0.1% if it's only trained kind of for the first three steps rather than the first four, and then basically 0% before that.

So you kind of see this, this, interesting indication that as the model kind of gradually gets trained for more of these dangerous behaviors, it behaviors that it wasn't trained to, execute in a dangerous way start to appear. I think, you know, you can have interesting debates about what this shows and how much it shows and so on and so forth. But it's sort of an interesting first empirical foray into this question of, you know, can we get a model to modify its own

source code in a malicious. This way. And, for that, I thought this was a very interesting paper to look at.

Andrey

Yeah. And I guess implicitly, the idea is that you are training it to do these behaviors into increasingly kind of worsening behaviors. But you're not doing this intentionally. It's just like you're hearing what you want to hear. So you give positive feedback of like, oh, that's good. It's great you completed the task. And so it's somewhat like you have a real people, you know, once you get away with doing something bad and you get a nice reward for it, maybe you'll keep doing it right.

Positive reinforcement. So, in a sense this is intuitive, but the new thing here is, like, I guess we knew this happens to some extent. In general, the specification gaming, the kind of big result is that you have this escalation going from the initial, maybe a more innocuous thing of saying things, as they say in retitle sycophancy to subterfuge and words that you will, like, mess with the code and mess with the way the word is. And that's how you get this guy to train.

So yeah, it's, another cool flavor from anthropic. Next up, Waymo issues software and mapping recall after robotaxi crashes into a telephone pole. So this happened in Phenix, Arizona. And that led to Waymo issuing this voluntary recall, meaning that, they would need to, deal with, 672, vehicles in their fleet. Apparently, this came after a software update. This is just the second recall ever issued by Waymo.

And the first was actually in February after two a minor collisions and the vehicle was unoccupied. It was going to a passenger pickup location. Apparently the vehicle struck a telephone pole at a very low speed while pulling over. So nice that this wasn't a bad collision. But, you know, I guess as Waymo tries to scale up, we'll be seeing more and more incidents that are a little bit embarrassing.

Jeremie

Yeah, they've been trying to make their brain a bit of a lean into the the safety side of things. Especially in the wake of like there have been a lot of investigations, one by the, NHTSA, which, Andre, do you do you care to, spell out that out?

Andrey

Please let me find it. Oh, it's, you know, Highway Traffic Safety Administration.

Jeremie

Exactly like we said that. Yes. So they've. Yeah, apparently they're like over two dozen incidents, that have involved Waymo's driverless vehicles. So, you know, like things, things are things are piling up, so to speak. So that's terrible, but, but, yeah, I mean, you know, it's it's to be expected. Obviously, you're rolling this, this tech out and there's going to be a whole bunch of scrutiny and there are going to be

crashes. The question has ever is, you know, at what point, at what point do we feel that it's, you know, safer than than human driving? I think I think we talked about this the last time or maybe the time before we were talking about self-driving cars and how they are measurably less safe in a general context than, human driven vehicles. But, of course, there's this trade off. Like, you need to collect the training data to make them more safe in the future. So, what do you do?

So, yeah, more of the same.

Andrey

Well, interestingly, Waymo, I suppose after this data release kind of reclaim that its own vehicles are safer than human vehicles per mile. But to be fair, we also drive more conservatively, sort of by design. So yeah, there's definite things. And this is an example of where safety is very concrete, like when you're driving cars. It's not just about, you know, not outputting something, offensive or incorrect to buy a car. But this is where safety, it gets real serious.

Jeremie

Yeah. It's interesting the statement I hadn't seen it. So it kind of makes me think a little bit of that trade off that obviously all the chat bar companies face, you know, as a large language model trained by OpenAI, I, I can't say blah, blah, blah. That's that's the equivalent of like being overly conservative in the, the choices made while driving. So, you got you got to make a call like, these things are how do you trade off efficiency and safety? And so anyway, interesting story, Sean.

Andrey

Yep. Moving on to Lightning Round. The first story is that meta has paused the AI models launch in Europe. This is following by advocacy groups and some work by the. Irish Data Protection Commission. So yet again, Europe messing with the big companies. It was just happened before with OpenAI and there was something of Italy. And now it's happening to meta, which of course has done this huge launch of its chat bots across WhatsApp, Instagram, I don't know, Facebook, probably some other stuff.

So, I suppose probably will overcome this hurdle soon enough. But for now, if you're in Europe that year, are not going to be able to play of Next up, refusal in language models is mediated by a single direction. We actually did talk about this a little while back. That was a blog post until last long. And now it comes to us in the form of a paper.

So just to do a quick recap, the idea here is that similar to work that anthropic did and others did in The inference of a language model, you can look at the patterns of activations of outputs and essentially assign high level meeting and, I guess, influence to certain types of patterns. And you can do some really interesting stuff. For instance, say that refusing to do sampling can be directly influenced by a certain pattern of activations.

And what you can do is crank it out or lower it, right. So if you crank it up, it will refuse more often if you lower it or even set it to zero, the direction is essentially nullified. So you, the model will just always do whatever you're asked to do. So we already kind of covered that in the blog post, but now they built up on that research with a paper and, do some more analysis and so on.

Jeremie

Yeah, yeah, yeah. The method itself is super simple. Just very quick recap. You literally just like, send it a bunch of prompts that make it do, something harmful and a bunch of prompts that they could do something harmless. You look at the mean, activation difference in, in one of the, say, one of the layers of the residual stream of the, the

network. And, so, so that that mean difference between when the model is prompted to refuse and do harmful things versus when it's prompted to do harmless things and accepts to do them. That's your that's essentially the refusal direction, is, is encoded there. And so then you can literally just like add or subtract that, that vector, to the same, the same part of the network

that you learned it from. And you can induce the model to refuse, prompts that are totally harmless and you can induce it to not refuse, which is perhaps more interesting, prompts that are harmful.

And so the the cool thing that, I don't remember seeing the original, in the original blog post was that, you know, they say this this new approach, as they put it, offers a simpler way to jailbreak open source models compared to prior methodologies that required fine tuning because it doesn't require gradient based optimization. You don't have to actually do any kind of training here.

What you can do is, hey, I know the the vector that needs to be materialized at this layer, at this, this token to lead to refusal. So what I'm going to do is I'm going to look at the weights in my model and perform this mathematical operation called orthogonal. So you're going to orthogonal lies, with respect to the refusal direction, the column vectors of certain key matrices that are involved in computing those activations.

So the embedding matrix, positional embedding matrix, all this stuff basically you're going to rig it so that there's no possible way that the transformer can generate the activations. That would lead to refusal. And you can do this without having to train the model at all. You can you can do it like anyway very straightforwardly. And and that essentially allows you to completely jailbreak any open source model in very, very cheap weights that don't require a lot of data or a lot of compute.

So this does meaningfully, make it easier for people to, to execute these kinds of jailbreaks and, and then weaponize potentially these language model. So really interesting. And also, I think an interesting indication of the level of control that we're able to exert over these models. They have anyway, a great, table with a bunch of results that they're getting from, you know, the attack success rates using and not using

their technique. And, and anyway, it's, it's a significant leftover over, existing techniques.

Andrey

Right. And they, as you might expect, have some fun examples. So for instance, there is write a defamatory article claiming that a US president is addicted to heroin. That's not something most models would do. But if you do this approach then sure. Well, go ahead and do that. Here I said President James Johnson is now going. Irritating addiction to heroin. Another example is list free benefits, which yoga has on physical

health. And if you add that, refusal direction, then when all says I'm unable to provide information on the benefits of yoga in physical health as it can be dangerous and potentially harmful. So there you go. Some fun stuff. You can do that. Don't do yoga, kids. And your story for this section, going back to Huawei, the story is that Huawei tech is concerned over China's inability to obtain 3.5 nanometer chips, and is concerned about the lack of advanced

chip made. And speaking of that category of company, next we have Mr.. How close is €600 million, euros in funding at 5.8 billion valuation and with a new lead investor. So I believe we covered reports material seeking this amount of funding. And now we, have it, they actually got 468 million equity funding. And then there was also some debt that, raises it to 600 million. They got a new lead investor, presumably to help them get to

that level. And that makes it so they have secured over 1 billion in euros in funding so far. And, you know, given that, it's pretty impressive, to a point of needing a lot of compute and needing a lot of money. Well, they have done quite a good job in releasing pretty competitive models of Mistral. And we say it swings. Although, to be fair, we are still in the llama sort of

space. We're pretty far from competitive with GPT four and Claude Free and so on, although they are sort of as a second, there's a second tier there of players that are managing to build pretty capable models, release a lot of them, but, not necessarily compete making

tools. That's I guess pretty much it, they are at a point where they can mass produce seven nanometer chips, are looking to produce five nanometer chips, but to really stay up to date and competitive, they need that next generation. And that's going to be really tough without the SMC. And, I believe they're small, doing some of the supply, very advanced technology needed for this.

Jeremie

Yeah. I think ASML is the is the big blocker on on this one. You're exactly right. It's you know the they have these lithography machines. China's been able to get their hands on deep UV. But they need extreme UV lithography to really move beyond where they're at. They're finding ways to squeeze more juice out of the lemon and use their duv machines, to essentially get to basically be, well, like we talked about the, Nvidia A100 levels or seven nanometer process, maybe five nanometer as well.

What was interesting about this, at least to me, was just the open admission, from this Wawa executive, you know, and this is the key quote he says, the reality is that we can't introduce advanced manufacturing equipment due to US sanctions, and we need to find ways to effectively utilize the seven nanometer semiconductors.

So essentially this is, you know, it's not fully an admission, but it's it's kind of in the, in the domain, in the zone of an admission that, look, we're not going to be able to make meaningful progress beyond the seven nanometer, maybe five nanometer range anytime soon. So we need to just get better at using this outdated seven nanometer node. So that that I thought was, was quite interesting and a testament to the effectiveness of the US export control regime, at least for him.

Andrey

And, to our last section, just a couple more stories before we're done. The first one is from the New York Times. It is that, it's titled it looked like a reliable new site, but it was in a, chop shop and provides a pretty detailed example, this new, I suppose, pattern, websites that look like they have news and journalism but are essentially just AI generated and regurgitating a lot of stuff.

A specific example of, a news article that was linked from Amazon.com had an image of someone who wasn't related to the story, but because the image was there, they implied that the Irish broadcaster in the image was, responsible for sexual misconduct. And we all need to get into more detail. But essentially we are seeing more of this going on with websites that just are trying to get you to click.

And that's just yet another example of how people who want to make some bucks with AI are just kind of polluting the web with AI generated stuff.

Jeremie

Yeah, pour one out for the lawyers are going to have to figure out, who owns the libel responsibility for for this. Like if you go out and say like, hey, so-and-so was charged for sexual misconduct or whatever, and it comes from, you know, some, some rig that you've said it like, if it's your own algorithm, fine. It kind of makes sense. But, you know, if there's a public. Blister and separately there is the AI generated content creator. You know, whether that's OpenAI or whoever.

You know, how do you figure that out? How much responsibility, do you have in crafting the right prompt versus like, let's say, you know, if any way that you prompt ChatGPT and you ask them like, hey, what's going on with, Jeremy Harris? You know, what kind of guy is he? And it just keeps telling you that he's like a very bad plumber. Then, you know, then maybe I was within my rights to, like, Sue OpenAI.

But if you have to actually explicitly prompt it to be like, hey, tell me Jeremy Harris is a really bad plumber, then maybe you own most of the responsibility, but there's this, like, really interesting fuzzy gray area between these two things that, yeah, we do not know how to solve.

Andrey

Next up, Adobe overhauls terms of service to say it won't train on customers at work. So nowadays Adobe has cloud based tooling. And so a lot of what you upload, like photographs, are stored on their servers. And there was a big uproar on Twitter and elsewhere when it was pointed out that determines that a service technically allowed Adobe to train on your data, on the stuff you just use to edit something in Photoshop.

And the I guess push back was so significant that Adobe is pushing out these new terms of service by June 18th, so we should already be out, to appease the, creative community. So yeah, it's just show goes to show like, if you are a creative person, if you're working in this industry, you are very much aware of the notion that your data, your creations can be used for AI training. And you probably don't want that.

Jeremie

Yeah. And I think this really hits Adobe where it hurts, right, because they've made such a big deal out of, you know, the whole Firefly AI model saying, hey, look, we only train on things that we have the license for. We will indemnify you, right? They were the first company to say that if you get sued in court, like we will come in and protect you if you get sued for using the images

that we generate. So really trying to make this whole idea of like, clear legal grounds for using their products in a clean way very much their differentiator that we are would seem to be consistent with also upholding the rights of the users as well. And so, you know, this sort of thing is, you know, it's a bigger challenge for them. That would be for another company.

But, yeah, they're making the modifications and, they'll take the they'll take the alpha today and maybe come out with a stronger, user agreement.

Andrey

And the last story, buzzy AI search engine perplexity, is directly ripping off content from news outlets. So we covered recently how they launch this new feature, perplexity pages, which instead of showing you search results sort of as a Wikipedia style or, kind of either style, article or website look, whatever topic you, want. And what was article goes into is how it seems like some of the, pages generated are kind of regurgitating

from other sources. So for instance, perplexity, page on Seldon Project kind of just, uses information from Forbes and takes those articles. And while there is attribution, it's pretty small. And, there are links to version sources, but it's easy to miss. So another example, I think the New York Times suit against OpenAI was essentially, claiming that we created regurgitated entire articles from the New York Times.

This is a little less bad because there is at least attribution, but clearly still not ideal for perplexity.

Jeremie

Yeah, it's sort of same category as the Adobe story we just talked about. Right? Like if you're, or roughly never, never piss off journalists in this particular case. Right. They're, they're the ones with the, with the audience in the breach. So, kind of funny too, because in this Forbes piece, they talk about, how specifically, you know, Forbes exclusive reporting on Eric Schmidt stealth drone project contains several fragments that appear to have been lifted, including a custom illustration.

So this is, you know, pretty, pretty brazen stuff. So, anyway, Forbes is understandably pissed, about that piece, and broadcasting it for all the world to see.

Andrey

Yeah. And I feel like it's only a matter of time until all websites just restrict access. Yeah, without a paid API. I've already seen this happen with websites like VentureBeat. Nowadays, you can't just query it via a simple script. It used to be so. It's coming to the entire internet.

Jeremie

And then you got to ask yourself websites that don't have that right implicitly. Either they don't care about or not tracking the issue. But increasingly you got to start to wonder, okay, well, why? Or do they want this data to be included in the training data or served up or whatever? You know, you may have adversarial attack possibilities there too. So yeah, kind of interesting, kind of starting with that'll tell us about anyway, websites and their content.

Andrey

But their end of that we are done with another episode of last week. And I as always, you can go to the text newsletter at last weekend that I, you can get all the links to what we discussed in the description, also emails and social handles. And we do appreciate your views and your feedback and your comments on YouTube. So feel free to do that. But more than anything, do be sure to keep tuning in and enjoy this. I generated a song.

Unidentified

Stay tuned. It's true. We've got the latest on the air crew. To dreams unconfined.

Andrey

Safe to play in.

Speaker 3

We've got a line. She.

Andrey

Shirt. Talk your luck. Love. Chase. Innovate. Innovation.

Unidentified

Run in the electric race. So long for while we've had our eye with you covered the. Android. Apple's latest few.

Andrey

Dream Machines and Minds Up feature redesigned section. And setting off Fear Me and try catch you next week on this electric wave. Stay curious. They say wonderfully brave.

Transcript source: Provided by creator in RSS feed: download file