#155 - ChatGPT memory, Altman seeks trillions, Califonia AI regulation, art gen lawsuit | Last Week in AI podcast

⁠¶ Intro / Banter / Sponsor

Andrey

00:05

Hello and welcome to Skynet Today's Last Week in AI podcast, where you can hear us chat about what's going on with AI. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. You can also check out our last week in AI newsletter at lastweekin.AI articles we did not cover in this episode, I am one of your hosts, Andrey Kurenkov. I finished my PhD, focused on AI at Stanford last year, and I now work at a generative AI startup.

Jeremie

00:31

And I'm your other host, Jeremie Harris, the co-founder of Gladstone AI, which is a kind of national security AI sort of safety company, focusing on increasingly AGI and AGI like systems.

00:42

So I do want to say, by the way, last episode, I made this just kind of offhand remark at the beginning, saying that like, we're looking for partnerships in or sales with our partner defense focused stuff and our, intelligence community focused sale stuff, we got a whole bunch of, of outreach, not just from amazing people who I think are going to be so, so great to talk to.

01:03

And I've got calls booked with them already, but just from people who were like, I can't help necessarily with this, but I just want to like, be supportive. And I honestly like that was super humbling and amazing how supportive our audience is. So just like a big thank you to everybody who listens to the podcast, whether you're a regular listener or you're just tuning in every once in a while, like, I was blown away by it. And, yeah, just super appreciative. So thank you so much.

01:29

Yeah. And we got some, useful comments. Also, after that last episode, we got a shout out to one story we didn't cover last week that we will be covering this week from someone, so that was also very helpful. Nice to see. So yeah, thank you. And do feel free to comment on any platform we are on YouTube or Substack or feel free to email. As always, we have emails in the episode description, or you can just go ahead and type in contact that last week, that.

01:59

Is basically just sitting there refreshing the comments section nonstop. So just anything, anything, please.

Andrey

02:05

I am subscribed to so many newsletters. Oh my God, to keep up with the news. So I check my emails daily and like go through dozens of them. And as a result, I do try and also make sure to send you listener emails. So where you go before we get into a news quick sponsor read and we are once again promoting this super data science podcast. This is one of the biggest technology podcasts globally. They cover not just data science, but machine learning, AI, data, careers, various things.

02:40

It is hosted by John Crone, which chief data scientist and co-founder of a machine learning company, Nebula, and the author of a best selling book, Deep Learning Illustrated and just generally a very, very knowledgeable person when it comes to AI. This podcast has been going on for a long, long time. There are over 700 episodes with all sorts of people. So, you know, this podcast is like 150. That one is at 700. He must have learned everything from all the people you've talked to, so far.

03:13

So, definitely go check it out if you'd like to hear from people in the world, data science or machine learning or I kind of more of a person based way to see into the world as opposed to what we do, which is more news from media. And so what's going on?

Jeremie

03:30

Yeah, John is a great interviewer. And, because because he is bald, he never has a bad hair day. And I'm somewhat jealous of that, but, yeah, he's a fine, a fine gentleman and a scholar. And you should definitely listen to that podcast.

⁠¶ Tools & Apps

03:45

And now, kicking off a news starting with Tools and Apps section. First story being that ChatGPT is getting memory to remember who you are and what you like. So this was just announced and this is a feature that will be starting to roll out. It's not available for everyone, but it's, pretty much what it sounds like. Your ChatGPT chatbot will be able to kind of manually or I guess, automatically remember certain things that come up during your conversations.

04:20

And each custom GPT will have its own individual memory, allowing for more, personalized experiences across, various spaces. And the last thing is, you will be able to see each individual snippet of memory. So this is not like some sort of like neural memory, like it's a literal little string of text that sums up a fact about you. And there will be, UI component where you will be able to see each of them delete stuff that you don't want to be remembered, even individually.

04:51

Add stuff as well. So yet another new, more kind of product type feature of an AI type feature coming. To ChatGPT. Yeah. And the implementation of this and the fact that OpenAI sees potential demand for the I don't want to call it interpretable. It's almost an interpretability type solution. Right. Like, what does the system know about me?

05:11

Obviously, if they're storing it in a form of more like a database of raw strings, and if they're using kind of rag like retrieval augmented generation to, to get the chatbot to just ping that database, then it's not quite the same as neural

05:26

interpretability. But what I think is kind of cool here is, you know, this does start to introduce a potential kind of, not necessarily business model, the least use case for consumers for deeper levels of AI interpretability, which I think is great for safety and kind of getting market incentives aligned with with some of the important interpretability work that's being done right now. Apparently this memory feature works in two different ways.

05:50

You can either tell explicitly tell ChatGPT to remember specific facts about you. The examples that give me Oracle or like, you know, I always write my code in JavaScript or my boss's name is Ana. Or alternatively, ChatGPT can just try to pick up those details over time. So kind of like implicitly learning from its interactions with you. So you can kind of go either way.

06:11

Apparently each custom GPT you interact with will have its its own memories, you said, and there's a whole bunch of, measures that they're taking to, again, give greater visibility. You know, I don't say again, greater interpretability. It's a fuzzy word now because there's this ambiguity as to whether, you know, we're getting visibility into the knowledge. It's stored in a database that just gets queried by the model or in the model itself. But, certainly greater control of

06:35

what's in that memory. And apparently the system has been trained not to remember things like information about your health by default. That's kind of interesting. So like the models have to learn, you know, to learn some stuff but not other things. And apparently you can always just ask ChatGPT what it knows about you. So we know by default this memory feature is going to be turned

06:55

on. This is not the the first time that OpenAI is kind of kicked things off with a default on mode for one of these more kind of permissive, I don't say intrusive applications, but certainly it is more intrusive in the sense that it's going to remember more about you. So yeah, right now, just in testing small portion of the user population, I suspect we'll be seeing a rollout fairly soon.

Andrey

07:15

I think it's kind of interesting they are adding this. We already had, I believe, ability to manually add in some instructions that are global. Right. Like system a teacher system prompt. Exactly. And this is probably in practice not too dissimilar from that feature already had. And but I guess the key difference is that we, I will be able to automatically store some things for future usage without you having to tell it to do that.

07:44

So I guess there is a play here to make these, chat bots as individualized as possible. And we've been covering how there's a bit more pressure happening on ChatGPT coming from Gemini. And many people launching chat bots, really. So it seems like we'll probably be seeing more of these sort of product iterations and, user experience iterations rather than like fundamental AI improvements to try and provide the best user experience.

Jeremie

08:19

And, next we have Rekha Flash, an efficient and capable multi-modal language model. So, you know, the, the, open source, language models that are about 7 billion to 12 billion parameters. Those are a dime a dozen. We talk about them all the time. It's rare ish to have new kind of, pre-trained foundation models at scale. And that is the category that this kind of falls into. It's funny. Last episode we were talking about, you know, Andre, you coined the term small, large language models at 7 to

08:47

13 billion or something. Well, we were talking about how big is big. And this is a 21 billion parameter model trained entirely from scratch. So, you know, maybe, you know, maybe falls into that category. Apparently. So this is Rica's model. It's competitive with Gemini Pro and GPT 3.5.

09:04

So quite, quite interesting, especially given that GPT 3.5, you know, if you look at the largest version, like is is a fair bit, larger than that, apparently it outperforms Gemini Pro on a whole bunch of interesting benchmarks, including MLU, and its competitor, which is a more general purpose language understanding benchmark. And it's competitive on GSM eight K and human eval. So GSM eight K is a math benchmark. So for logical reasoning it actually seems like this model is doing

09:35

pretty well. And there's a whole bunch of interesting, kind of data that they share about it. It was pre-trained on text from over 32 different languages. There is a more compact variant. That's called the wrecker edge. It's going to be 7 billion parameters and called wreckage, of course, because it's meant to be presumably deployed on edge devices. So it's smaller so it can fit on them.

09:55

And they've got a playground, that you can, you can play around with it on I. I thought what was really kind of most interesting about this is that they've done this, Rica has done this. They've built Rica Flash on relatively little in the way of resources. So they raised a $60 million series A so far typically, you know, that's just like not going to be enough money to do anything. Certainly nothing at the frontier. Now this is not a genuine frontier model.

10:20

Right. It's far from that. It's GPT 3.5 level right now. We're on to, you know, God GPT four really GPT five. But it is interesting and noteworthy that it's competitive with some models that people do legitimately pay to use. And that is GPT 3.5. And that is Gemini Pro, right. So Rica really kind of making a bit of a splash with Rica Flash and certainly an impressive model, maybe more impressive than I would have expected. A company that's, you know, as funded as Rica is at this stage.

Andrey

10:48

And they do say that their largest and most capable model Rica Corps will be available to the public in the coming weeks. Presumably, Rica Court will be at least closer to that jubilee for Gemini Ultra level of performance.

Jeremie

11:03

If it is, that's going to be really impressive, actually. Yeah.

Andrey

11:06

Yeah, exactly. And, as you said, you can go ahead and try this out in the record playground. Rica is a bit more of a like, I guess, research group so far and a bit more focused on the API deployments. As far as I can tell, I it doesn't seem they're aiming quite as much to be like a chat bot provider, but as yeah, you can go ahead and play around with them and they'll be probably at least as good as, GPT 3.5 and 3.5 isn't multimodal, actually.

11:37

Right. So it's different. And that way it's, quite impressive. I think, like you mentioned, GPT 3.5 is, to our knowledge, at least initially, Gpt3 was like 185 billion parameters when it came out years ago. And the fact that you can squeeze this much performance out of, a much smaller model indicates, I think, quite a bit of movement and improvement in our understanding of how to optimize these models with fewer parameters, you know, smaller size, less compute, but still

12:09

being as performant. And yeah, that's I guess, what happens in AI, we kind of squeeze out as much performance as we can.

Jeremie

12:16

Yeah. That's right. And this is, you know, just to situate this for listeners who might be used to hearing the story of like AI scaling. Right. And bigger is better. Yeah, scaling remains true. But at the same time, we have algorithmic advances that are compounding that. And so you can, as Andre, you just said, like you can squeeze more juice out of the lemon, using the same amount of compute, the same model size. And so that's a big part of what we're seeing here.

12:41

And if you are Rekha, you are absolutely going to be interested in those kinds of strategies, again, because you're operating on a very limited budget, like $60 million of series A funding. That's where they're at. That is not enough to compete with, you know, like GPT four is estimated just the training run alone to have cost anywhere from $4,100 million in one shot. That's adjusting compute that does not account, for example, for salaries, which are just gigantic in the space.

13:06

So yeah, I mean, it's absolutely something they have to do. I think one thing I thought was really interesting here, so they actually the wreck of flash model and wrecker edge, they were both pre-trained as usual on, you know, like kind of text autocomplete objective. They got instruction fine tuned. So they got additional training on a data set that consisted of of instruction data and instruction following data. And then they got RL with

13:34

PPO. And that's a kind of newish technique that's starting to show up everywhere. But the key thing is they actually used RecA flash to provide the reward model, the model that that essentially evaluates the quality of its own outputs. So they're using their training record flash, their pre-training record flash, and they're actually going to use Recode Flash to evaluate Recode flashes on outputs in this kind of reinforcement learning from, well, it's not human feedback now, right?

14:00

It's reinforcement learning from AI feedback loop, that they've set up. So kind of interesting. Certainly reinforcement learning from AI feedback is a thing. It's an increasingly popular thing, but noteworthy that it's now kind of full on being used by them. Like, I don't know that they mentioned anywhere in the blog post any kind of actual human feedback in the process. So that's kind of an interesting little note there.

Andrey

14:22

And just one more quick story in a section. The headline is say what chat with RTX brings custom chat bot to Nvidia RTX AI PCs. And this is from Nvidia. So this is pretty much a tech demo called shadow RTX. And the idea is that you can download, chat bot and run it locally on your Nvidia GeForce RTX 30 GPU or higher. So you can I guess this is demonstrates that if you have, I guess, a gaming. Or a gaming computer that has these pretty beefy, but not sort of like supercomputer level GPUs.

15:08

At this point, you can definitely go ahead and run a chat bot and customize it and, you know, do whatever you want. This also has retrieval, augmented generation, and various kind of accelerations on top of it to demonstrate the all the stuff that Nvidia brings to the table to optimize your inference performance.

Jeremie

15:32

Yeah. And it is obviously like it runs locally. That's a big part of the the sort of logic here of how it works so quickly. Like it's blazingly fast because it's running on your local computer, there's no pinging the chat, GPT server whatever. And also better for privacy, right. All your data stays on the device. So there's this interesting question of what the future looks like when we look at, you know, customized chat bots or just chat bots in general.

15:55

You know, do we have chat bots that are actually stored locally on whether it's our PC or Mac or our, our GPUs? Like, does it essentially do we have computing happening at the edge in all cases for privacy reasons? For speed reasons? Or does the future look more centralized in the form of a, you know, OpenAI style server serving up a model like this? And right now we don't know. But this is certainly Nvidia playing around with with the idea of the former.

16:23

And this allows you to connect all kinds of different open source models like Mistral or Llama two. So, you know, you get decent, decent models for sure out of this. One of the examples that they give is, you know, you can imagine asking this chat bot what was the restaurant my partner recommended while in Las Vegas, and chat with RTX will actually scan local files on your computer and point you to the answer with the context.

16:47

And so this is kind of like, you know, Google search for your own computer in a way, or maybe more like, you know, ChatGPT for your own computer. But certainly kind of interesting. And again, for those privacy reasons, maybe something people would be more keen to deploy into.

Andrey

17:00

Exactly. Just add a bit more detail. What this looks like in practice is, GUI application. So you don't need to be a developer and run terminal commands or anything. It has, little, drop down menus for everything. So there are currently, I think, just Mr. Allen, a two in that dropdown menu probably will add more open source options, as they become available. And there is a little UI to be able to say have access to these files when you're answering my question.

17:33

And it will then use that retrieval stuff that is built in to be able to answer questions relevant to your files. So yeah, it's a fun little tech demo and a fun little thing to try if you're, someone with a decent GPU.

⁠¶ Applications & Business

Jeremie

17:49

And now moving on to applications in business and, you know, it's a Wednesday. It's about time. You know, we haven't heard from OpenAI in a while or Sam Altman. We're we're about due for, you know, Sam Altman getting up and saying he wants a trillion industry, $7 trillion to just, you know, reshape the entire economics of semiconductor fabs. So so that's where we're at. This headline is Sam Altman seeks trillions of dollars to reshape business of chips and AI.

18:15

And this is all coming from, you know, people familiar with the matter type thing. So it's not an official announcement. But you might remember from previous episodes we've talked about Sam talking to apparently folks in the UAE, the United Arab Emirates, about some ship project. It wasn't super clear. It now seems as if he's been asking them for help, raising as much as 5 to 7 trillion with a t dollars for whatever this chip project is. We're getting a bit of information about that now.

18:45

For context, and this is funny every every couple articles about this, they always do this. They always start listing all these comparables. They're like, okay, just to put this in context, right. So this would dwarf the current size of the global semiconductor industry. So global sales of chips were half $1 trillion last year right. Remember he's raising 5 to 7 trillion with a $2. So half a trillion is the annual kind of global,

19:09

sales of chips. It's expected to go up to about 1 trillion annually by 2030. So that's kind of like, you know, he's going to be raising like 5 to 7 times as much as the whole market is going to be worth in 2030. And then global sales of manufacturing equipment for semiconductor chips. So these are like the kind of ASML type companies that we talked about before were $100 billion last year. All this stuff is like tiny, tiny, tiny.

19:34

Again, more context. This is larger than the national debt of some major global economies and bigger than giant sovereign wealth funds. Okay, so one problem that comes to mind, you know, put my startup hat on on this. First of all, you never, ever bet against Sam. That much is clear. You never, ever bet a bet against a founder. Because when you're wrong, you're very, very badly embarrassed. But, one of the big challenges that I would imagine

19:58

starts to kind. Have a rise when you're looking at any big move like this. Like this is a space with extremely high like technical floor where, like, you just need a ton of technical knowledge to get involved. So when you're talking about plowing, you know, 5 to $7 trillion into this, at a certain point, you got to imagine you're going to be bottlenecked by talent as much as infrastructure. So you know what the bottlenecks are.

20:20

It's going to be really interesting to track. Like I'm curious what Sam himself thinks. But apparently he's been meeting with all kinds of folks, not just the UAE. The Commerce Secretary, Jared Mondo, who's come up a lot on the podcast. Of course, they've talked about that, apparently had a

20:33

productive conversation. And I guess the the deal details, as far as we know right now, it looks like OpenAI is basically saying, look, we're going to set up some kind of I'm going to call it a consortium or whatever, but some kind of partnership with all the big fabs, you know, TSMC and so on. And and we're going to agree to be a significant customer of these new factories. And so, you know, they're going to fund a lot of

20:58

the effort with with debt. But it's all based on the promise of open AI among others, growing really fast. And coming into this, this high demand, the UAE hugely important government for this just because they have so much cash. But apparently Sameh also met with my associ son, who is the famous CEO of SoftBank.

21:17

And he also met with TSMC. So really like all the folks you'd imagine you'd want to meet with if you're raising some gigantic amount of money, like $7 trillion, unclear what they'll be able to do it, whether it will work, if he does, but it is the sort of thing that, you know, if you look at the way Sam Walton has been thinking about this, it makes perfect sense.

21:35

This is what you do. If you think scaling gets you to AGI, you do not place, a $7 trillion bet, which, again, for more context, is like roughly half of US GDP. You don't raise that kind of money for AI hardware if you don't think that the returns are going to be. And I can't believe I'm saying this, but literally on the order of the US GDP, that's what the returns would have to be in your mind,

21:59

if you were going to do this. And for those of us who think AGI might be happening soon, that actually makes perfect sense. But, this just reflects Emma's kind of continued doubling down on this, on this sort of theory of the case.

Andrey

22:12

Now, I do want to say I we didn't cover a story initially last week, although it was already out there, but it got quite a bit of play throughout the media. And it's worth pointing out that this number 7 trillion. It's coming from a quote in the article that says the project could require raising as much as 5 trillion to 7 trillion. One of the people said, for people being to use informed sources. Here. So it's worth keeping in mind that this is very early stage. Right.

22:46

A very, comment from the OpenAI spokeswoman was that OpenAI has had productive discussions about increasing global infrastructure and supply chains for chips, energy and data centers, which are crucial for AI and other industries. So there are discussions being had to increase the supply chain to build more foundries. Sure, it could require raising as much as 5 trillion or 7 trillion, but it could also wind up being much less right. It's just worth keeping in mind that this is all very nebulous.

23:19

This is kind of a broad direction. Something is pushing in, and it's not like he is setting out to get $7 trillion right now, at least as far as I can tell from the details so far. You know, that part of it has been a bit overblown, but the fact that he is seeking to do this very, very, very capital intensive thing, of creating more sources for chip production, that is certainly true.

23:47

And, yeah, as you said, Sam Altman, given his position as a lead at OpenAI, a very influential figure, very famous figure, now having been quite public, you know, if anyone can try and do it, I guess it would be him.

Jeremie

24:00

Yeah. No. And good point about, you know, as we talked about the sort of uncertainties around, you know, who's saying what and whether this is real, it does for what it's worth. I mean, it does align with, my understanding the costs involved. If you really wanted to crank up, hardware production, like semiconductor production, this is not an insane number to be thrown around if you expect, you know, say, two orders of magnitude of growth in the space, which, again, on that AGI hypothesis.

24:29

Yeah. Pretty, pretty plausible. But but you're right. I mean, there's tons of uncertainty. If nothing else, this forces us to have an interesting conversation about where the bottlenecks in the semiconductor manufacturing cycle are right now. And, you know, we talked about talent a minute ago.

24:45

There's also this question of, you know, what about the rare earth minerals like gallium and germanium especially, which are kind of cordless, mostly, not mostly produced in China, certainly mostly refined in China. And you know what? Like, what can we do to, like, massively increase that? Is money alone enough? And when you start talking about investments of this scale, you're actually talking about moving the market price of these things to.

25:10

And so they become more scarcity, become more expensive. So sort of like this nonlinear compounding effect that happens. So anyway, yeah, I totally agree. I think this is just going to be a space we'll have to keep an eye out for. And, you know, even even $1 trillion is is going to be pretty insane if that happens. But it may not happen at all. And, yeah, we'll just have to wait and see.

Andrey

25:28

And to be sure, I'm sure some people would love to be able to raise a 5 to 7 million to like. And if he, he wouldn't mind if that was possible. And it does seem he is going in that direction. So that's worth noting. Yeah. And you, you won't go into it, but it is, I guess are also noting that the Nvidia CEO did have chance to comment on this. Yes. Kind of, a little bit. And as you might expect, his general comment was like, well, you know, GPUs will get more efficient.

26:00

You know, all of this is probably overkill or something to that, kind of direction of kind of, underplaying the need for this and also kind of the entire effort to, kind of gently.

Jeremie

26:12

And onward lightning, round lightning, lightning, lightning. Sorry. All right, so the first I don't know why I did that. The first, article here is a report. It says China's Smic to begin production of five nanometer chips for Huawei. Okay, this is actually a really big story. We usually go through this song dance anytime we talk about, you know, five nanometer process. And what does that mean?

26:35

So just really quickly summarize, so right now there are three, what are called node sizes that you need to know about that the humans currently know how to make. So we know how to make semiconductor chips down to roughly speaking, three levels of precision. We have a seven nanometer process. This is the process that was used to make the Nvidia A100 GPU, which GPT four, by the way, was trained on. We have a five nanometer process which is used to make, the H100 GPU and all current top line GPUs.

27:08

This is the process that was used to train GPT five. And then there's the three nanometer process, which is currently, being used for the iPhone, basically. And it's, the only people who make three centimeter GPUs. Right. Sorry. Three nanometer, process is TSMC is Taiwan Semiconductor Manufacturing Company. They are absolutely the world leader on this. Now, other firms have since started to close in on the five nanometer process themselves. And, five nanometers is

27:37

really challenging. It can require new kinds of technology that, are export control that China cannot now legally get their hands on. They requires usually these devices called extreme UV lithography machines that are only really made it

27:52

by a Dutch company called ASML. So if you want to go to five nanometers, if you want to build, in other words, those Nvidia H100 equivalent chips, if you want to train a model eight GPT five, you're going to need to find a way to get access to extreme EUV lithography unless you make some breakthrough. And that is exactly what seems perhaps to have happened.

28:13

We've talked about this on the show before, trying to figure out whether this, Chinese company, Smic, which is a competitor to TSMC, which is the actual world leader in semiconductor manufacturing, it's a Chinese version of it. It's kind of China's best play here. They, apparently are managing to use their existing stock of U.S. and Dutch made equipment to produce five nanometer chips. This, if true, would be a significant breakthrough.

28:39

The big question now is about whether they're going to be able to achieve enough yield. So you might be able to make a five nanometer chip, but your process, if it's inefficient, if it you know, if it only works 10% of the time, then the cost per chip is going to be ten times higher. So that just may not be economically viable.

28:58

Right now, it seems like the yield from Smic is looking like it may actually be good enough to start shipping on this five nanometer process that they're working on right now. They are, like they've already partnered with Huawei and kind of surprised the whole industry when their mate 60 Pro premium smartphone launched, with a seven nanometer process, which they were not supposed to be able to do either. That was back in August.

29:24

Right now it looks like there are some Kirin chips, that are being designed with, y of Huawei's HiSilicon unit. That may actually end up containing, five nanometer chips. So we're going to have to watch this very closely. But this would be a significant breakthrough, both from a consumer standpoint in China, from a domestication of the AI supply chain in China and from the national security perspective.

Andrey

29:49

And for context, Huawei, in addition to working on phones, is also one of the big players in AI hardware. They have their ascend AI chipsets. And you know, over in the West we have AMD, we have Nvidia as can be major players with the export controls that really prevent powerful GPUs from being shipped pretty strictly as we've been covering.

30:16

It will be a pretty significant deal for China for Huawei to get access to this better known process and then to be able to apply it to AI hardware so that, you know, export controls don't matter quite as much, basically. Next story. And this is kind of a spicy one. A crowd destroyed a driverless a Waymo car in San Francisco. So not any like huge bifurcations here. But I do think a fun story to cover.

30:47

And it is a really what it sounds like during, celebrations of a Chinese New Year in San Francisco's Chinatown. Waymo car kind of got stuck. There were fireworks going around. So basically you just, stood there staying still while human drivers drove around and left the area. And somehow at some point, someone decided to go ahead and throw in a firework into one of the open windows of the car. And there are a ton of photos and videos you can find online of the car just being full

31:22

on flame. Like completely on fire. No explosions as far as I'm aware, but it was totally on fire. You know, if I the department had to come out, you can see how it was basically melted and, yeah, pretty dramatic. And it kind of builds on fire events of people messing with self-driving cars. That was a not a movement per se, but, a bit of a trend of putting, cones on sensors, every self-driving cars, just to mess with them. So, yeah, this totally happened in San Francisco.

31:56

And I think speaks to there being a lot of Waymo cars in San Francisco. And the people there are still kind of adapting to them and responding to them in different ways, including this act of vandalism that, you know, there's no specific person behind it. It was kind of a crowd, presumably just for fun, really. During the celebration, someone decided to go for it.

Jeremie

32:19

I love your your euphemisms there. That was some great, like, press secretary lingo. People still getting used to these cars and, you know, in the way that one does, beating the living shit out of them. Yeah. Yeah. Actually, it's funny, I was, I was on Twitter, I guess ex and I saw I was still trying to figure out whether it was

32:38

this one. There was another video of, a people, a bunch of people kind of, you know, beating the crap out of one of these cars with a one guy had a skateboard and he was just, you know, hammering away. Maybe this was just before it was lit on fire or something. I'm not sure, but, yeah. Anyway, it definitely seems to be stirring up a lot of emotions, understandably, as we look at like we're automating away like a whole

32:58

category of jobs here. And there is something faintly dystopic about, like this particular episode of the Black Mirror series that we're running in 2024, just that, you know, you were seeing accidents and stuff like that with no humans responsible and so on and so forth. So, yeah, I mean, I'm curious what kind of PR push we're going to have to see from some of these companies to get people more comfortable with the idea. You know, it's no longer just about

33:23

dollars. And like, you know, how how much can you lower the cost of a cab ride, but also just, you know, how can you, like, reduce the odds that random people are just going to try to light these things on fire because they're upset, you know, whatever, whatever sort of existential angst they're experiencing for other reasons, too. But, yeah, really interesting and, freaky, freaky part of the show.

Andrey

33:44

Yeah. Part of me wonders if there is kind of a bit of, not just tech backlash, but AI backlash, brew ride. And, you know, self-driving cars is a pretty clear and present sign of it that's like out there in the physical world. So, yeah, it would be interesting if that kind of is part of a general cultural movement of our time, so to speak. But anyway, cool story. And if you want to see a melted car or a self-driving car on fire, go go to the link we provide or just search

34:17

for it. Next story OpenAI reportedly developing two AI agents to automate entire work processes. So this is kind of some insider info, not something OpenAI announced or released. But, as we headline said, this is basically looking at actual AI agents such as you, which, as we've been saying, is still more of a, AI model, categorized as something where you give it an input, it gives you an output, and that's sort of it.

34:48

An AI agent is something that you can tell to do something, and it can go off and sort of execute for a while autonomously, you know, giving, getting observations of the environment or the internet or whatever you want and deciding on actions, taking them and seeing the update or the result and doing that in a

35:08

loop. So, so far, OpenAI hasn't had, hasn't released any agents, at least not since their work in reinforcement learning back in the day when they were playing Dota and so on before all this AI hype. And so these AI agents are meant to, in one case, take over a user's device to perform tasks such as transferring data between documents and spreadsheets. And in the second case, it's more web centric.

35:37

So it's designed to perform web based tasks such as collecting public data, creating travel itineraries, or booking airline tickets. And we've seen some examples of these things before from other players in the space, especially for browsers.

35:52

We've seen demos and examples of saying, you know, book me a ticket to Atlanta, and there is an agent who just knows to go to a right website, do a right Google, click right forms, fill in the text, etc., etc. so not too surprising to hear that this is an initiative at OpenAI. And yeah, I guess we'll have to keep an eye on it and hopefully, see something concrete and official soon.

Jeremie

36:20

Yeah. And it's you know, I think the whole industry is kind of moving in this direction. And there are a couple times where we've, you know, we've seen these big trends, right? Trends towards multi-modality, trends towards agent like behavior. This is certainly the latter. There are a whole bunch of companies pursuing this. You know, OpenAI is Google with their Bart, assistant

36:38

certainly is. But there's like imbue you know rabbit adeptly AI all these companies it really seems like this is where things are going. Both because it's a better user experience, but also because increasingly you want these language models to be taking actions in the real world. And, you know, this is just the agent, framework is just so, directly kind of connected to. Yeah, there or action oriented, I should say.

37:02

Yeah. So, there's also it's worth noting, you know, these agents, are probably gonna be interacting with each other a lot, right? We already know that OpenAI has the ability or allows users to combine the capabilities of different GPT. And so, you know, this is probably going to be a very interactive thing. They're not going to operate in isolation. But, yeah, that definitely something to keep an eye on.

Andrey

37:24

And one, story for this section. Going back to self-driving cars, cruise names first chief safety officer following crash and controversy is headline. So we've covered the controversy quite a lot. I won't go into it too if I just say if there was a significant crash that happened last year, that really messed up. Cruise's overall fortunes and kind of a space in general was very impactful. So as a result, it's worth to highlight that they have named this first chief safety officer.

37:59

This, is Steve Kenner, who has been in the autonomous vehicle industry for a while, has previously held top safety roles at various companies and will be reporting directly to the president and chief administrative officer generally, I guess helping to both, in reality, improve focus on safety and how cruise can recover their reputation. And as I tried to go back out on the roads, do it in a way that goes better this time, I suppose. Yeah.

Jeremie

38:33

And they're positioning, Steve Kenner here for, you know, it seems appropriately to respond to or to report rather directly to the highest levels. The cruises president in this case and their chief administrative officer. So that's sounds, sounds reasonable to me, at least based on what little I know, their corporate structure, but apparently he's got a ton of experience. He's had top safety roles, apparently at Kodiak, low commission and, as well as Uber's, now

39:02

defunct self-driving division. So he's definitely got a lot of experience. Hopefully this, this helps them with the headlines. If, if, if at least that and then, obviously with more safety, hopefully the company can do better long term.

⁠¶ Projects & Open Source

Andrey

39:16

Onto projects and open source. The first story is cohere for AI launch is open sourcin an LLM for 101 languages. So coher for AI is a nonprofit research lab that has been around since 2022. So for a while, and they have now unveiled Aya, this open source language model that can support all of these languages. They have also released to the AI, data set, which has a ton of annotations. It, is a huge endeavor.

39:47

They had teams in participants from 119 countries contributing to this, and as a result, various 513 million instruction fine tuned annotations data labels to be able to train this model. So pretty significant release as far as data goes for sure. And as far as models go, it's yet another open source one, but in this case definitely more optimized for things beyond English, which is not a major focus for most things we've covered so far.

Jeremie

40:22

Yeah, that's right. It definitely is a weak spot that a lot of people have identified. You know, you have what are known as low resource languages. Of course, English not only is the most valuable from the standpoint of just having a larger global population that speaks English, but also because, well, as a result of that, you have way more data. So low resource languages, well have less data. And that's what this model is designed to help address.

40:43

It is a pretty remarkable exercise, like a pretty remarkable effort. They describe yeah, this kind of open access data set they're creating, they're calling it the Aya collection. 513 million prompts and completions across 114 languages. So apparently there are rare human curated annotations from fluent speakers, for for rare languages. So it's kind of kind of cool.

41:06

And they have benchmarked this model against, sort of other classic multilingual models, including a variant of so there's a model called MT zero. That, is sort of a prominent multilingual model they make. Anyway, they turn it into MT zero X with a little bit of extra training just to kind of make it more fairly comparable. Or sorry. Mt5. It's a variant of Mt5. And they find that, it compares quite favorably. It's preferred 77% of the time on average to, these other models.

41:38

And, that's a significant, significant delta. So that does mean it's a, you know, significantly better model. Yeah. This is also notable because it's coming from Cahir. Right. Cahir for I, which is the Cahir is like community ecosystem for like open source projects and sort of like, feel I for good type stuff. So in a sense, a kind of marketing exercise or at least it's a, it's can be chalked off as a marketing expense by Cahir, which of course, the competitor OpenAI, in the kind of L1 space.

42:11

So yeah, interesting that they're dedicating resources actively to this. It's a really interesting project. They do have to they've got to fix their sign up button, though, because the, the text overflows from the buttons. So just, just a helpful little tip there on the UI UX side, but yeah, looks really cool.

Andrey

42:26

Yeah, it's, pretty cool release. And yeah, to be very clear, I might have misspoken and said just Cohere, this is from Cohere for AI specifically, which is, a nonprofit research lab established by Cohere, the big for profit company that focuses on enterprise LMS.

Jeremie

42:45

And there is, by the way, a really cool, analytics dashboard that you can check out on their website. Just showing some of the regional analytics on like, you know, the number of submissions that they got from different regions and, you know, information about how the, the, the product project is being taken up in different places and how much you know, how many submissions per language, all that jazz. So if you if you're curious about that, it's actually it's worth checking out.

43:08

They've done a great job laying this out.

Andrey

43:10

Next story : BuD-E, so budyd, enhancing AI, voice assistance, conversation quality, naturalness and empathy. This is a project from LAION, which is a major institution that has done a few major projects before. They were, majorly responsible for retraining data for stable diffusion. So a lot of what kicked off a lot of the text to image hype and progress comes at in part from this organization, LAION. So they, in collaboration with several other groups, are developing this system.

43:47

Buddy, this AI voice assistant, they have created a baseline voice assistant that has a low latency. So 300 to 500 milliseconds. And they are working on getting it lower below 300 milliseconds. The whole project is open source, and they are building a data set of natural human dialogs and, you know, various quality surveys. So pretty cool project.

44:16

It does highlight of that, you know, to go from just a chat bot to an AI voice assistant, there's quite a bit of engineering and additional data and additional kind of optimization required. So interesting to see them working with several other groups to make it happen and make it be more optimized 100%.

Jeremie

44:36

And as we've we've talked about this on the show before, but, you know, when you talk about, voice assistants, the one of the key metrics you always look for is latency, latency, latency. Right. Because it's very awkward if you say something and a. Takes like three seconds for the system to respond, and it really ruins the user experience. That's why the focus is so, so intense on the latency piece. To your point. They're targeting response times below 300

44:57

milliseconds. Even with models like Lambda two 30,000,000,030 billion parameter models, that is pretty impressive. And for context, one of the things that makes this really hard, especially when you look at the vision assistant space, is you have to find ways to, take care. So once the text gets generated, you kind of so, so, text to speech systems, they, they normally see in, like, entire sentences, so that they have enough context to

45:26

produce a response. You can sort of think of that in the context of speech pretty clearly, because sometimes, you know, you don't know what the inflection ought to be on a particular word until you know how the sentence ends. And that's one of these irreducible problems. When you talk about these, these systems that are going to speak to us. So one of the key things that they've looked at is finding ways to get their text to speech, to develop context from hidden layers of

45:49

the large language model. And then, so they can kind of short circuit the processing a little bit and anyway, obtain kind of a faster output in that way. So we've really interesting workarounds that they've set up. Again, this is really AI hardware meets AI software. I think that's going to be the theme of 2024 and frankly, beyond. But interesting that they've already gotten this crazy, crazy five, 300 to 500 millisecond latency. Again with that. This is of the five two model, by the way.

46:17

So that's not a, you know, not a super small one. I think that's maybe 7 billion parameters. So pretty decently big. Yeah. For them to get through under 300 milliseconds with a 30 billion parameter model, that is, you know, if you can do that, then you can do some really interesting things with that, with, voice models.

Andrey

46:33

That's right. Yeah. This is, kind of interesting to me also as an announcement of a project more so than a release of something. So they released, a baseline, basically their first product, in this line of work. But they very much in this blog post also are inviting people to contribute as an open source project. They are inviting open source developers, researchers, and fuzzies, and they have a whole roadmap over a ton of stuff. So I do want to add quantization, various optimizations.

47:08

They do a still intending to work more on the data set and so on. So there's a lot of work still to be done in this project. And it is. Yeah, more of a new initiative that we are pushing towards. So if you are looking for an open source project to contribute to I guess FYI this is one that's out there that is open source, there is a demo, but you would, I think, have to go and look at the code and mess around if you want to try it out herself.

47:37

On to volunteer round with just a couple more stories. The first one is ever clip 18 to be scaling clip to 18 billion parameters. So clip contrastive language image pre-training is something we haven't mentioned a lot. But, you know, if you've been around for AI for a while, you know, it's kind of a big deal. It was, back in maybe 2022 around the time that Dall-E two came out.

48:06

Text to image. This was another very significant model from that time of had contributed a lot to the progress of text to image and just AI research in general. Clip models basically are able to compare text and images and say, how similar is this text descriptive of this image? And there's a lot of downstream applications for that. You can do classification, but you can also do training in various ways of image generation and stuff like that. So this is scaling that to 18 billion parameters.

48:40

What they say is the largest and most powerful open source clip model to date. It achieves really good results. It's open source with data set is openly available and is actually smaller than in house data sets employed at other clip models. So pretty significant, I guess, as this sort of tooling infrastructure system thing. Not something most people use directly per se, but an important type of model for a lot of applications.

Jeremie

49:09

Yeah. And to your point, right, clip is not, it's like an image classification model, but it's a little more flexible than the classic image classification models that we used to have back in the day, where you would just have a label and you would try to like the or you might have like a thousand different categories, classes that you want to associate with a given image, and you would label the images with one of those 1000 different categories.

49:31

One of the problems with that approach is that you're limited to those 1000 different categories. And so your your image, your vision model ends up not being able to generalize as well out of that distribution. And Clip was one of the first ways that actually OpenAI built the first clip model back in. I think it was 2021 or early 20. One. At the same time, by the way, I think the very same day as they announced the first Dall-E model in 2021, really?

49:54

I thought it was 2022. Was it, maybe January 2022? Going to the trusty AI tracker to find out. Clip is 2021.

Andrey

50:04

Wow. Time flies.

Jeremie

50:06

Yeah, I know right. It did. It should have been 2022, but. Yeah. No. And so, as you were saying, right. It is more general thing. It's, it allows you to associate kind of a longer text description to an image, have that, generated in that way. And clip models have been used, like you said, not just for straight classification, but they're often combined with models like Dall-E to rank the outputs of those, sort of image generation models so that you effectively get overall a more

50:34

effective. Yeah, model on the whole. One last thing I'll just mention, they have a scaling curve that they show at the very top of their paper, their sort of figure one. So they're showing the scaling behavior as they increase the number of model parameters. What happens to the zero shot accuracy of their model in, in sort of like classifying images. And what's interesting about it is the scaling curve does not seem to

50:56

be bending. It seems to be very healthy, kind of from, you know, say 75% accuracy all the way to 82% accuracy, which implies there's a lot of juice left in this particular lemon. So it's a really interesting, advance. The team, by the way, is from Beijing at the Beijing Academy of AI, by which is sometimes you can think of that as like China's leading AGI lab, and also Tsinghua University, which has an open affiliation, with the People's Liberation Army, the Chinese military.

51:24

So, sort of an interesting, an interesting development here and definitely, another flex for Chinese AI research.

Andrey

51:31

And onto our last open source story, this one from stable AI. And it is introducing a stable cascade. Stable cascade is a new text to image model that is essentially kind of an alternative to stable diffusion. It builds on a different architecture where stable diffusion basically does the whole generation end to end. This has these stages versus the cascade, and that leads to potentially better results. As a highlight in this blog post, we are releasing this noncommercial

52:08

only. So this is not going to be deployed in popular, applications out there that allow you to generate images. I have these officially. You're not allowed to, but yeah, if you go look at the blog post, the results are quite impressive. They show in particular a lot of faithfulness to a prompt and being able to follow your instructions very carefully.

Jeremie

52:32

Yeah, it's really impressive. Then they also have, some progress that they're touting on, inference speed too. So it seems like they've, they've managed to cut down relative to, two other kind of more or less, well, I don't know if it's apples to apples. They're doing a 50 step to 30 step comparison. Anyway, it's definitely, it's definitely an interesting, advance, maybe a modest advance on, inference speed, as far as I can tell, just from looking at the figures. So.

52:58

Yeah. Cool advance. And interesting to see stability continue to pump these things out. The images do look good, I will say. I mean, there nothing nothing obviously wrong with any of the faces or hands or anything like that. So, another another big leap forward for stability, AI

⁠¶ Research & Advancements

53:14

and moving on to research and advancements. We start with self discover large language models, self compose reasoning structures. Let's talk about prompting for a second. Usually when you have a language model, you have to come up with some kind of prompt to get it to behave optimally. Right. You have a problem you want it to solve. It's not the case. So you can usually just straight up ask the model to solve the problem, and it'll do it perfectly.

53:38

You know, sometimes works, but often for especially more complex tasks, you have to try techniques like chain of thought, prompting where you tell the model, hey, I want you to solve this problem step by step. Like, let's think about this step by step. You know, give me your your reasoning explicitly. And then based on that reasoning, kind of guide yourself, step by step towards the the answer. There are a whole bunch of other strategies like Self-consistency is

54:02

another one, right? You do chain of thought. You get the model to lay out its thought process and get an output, and then you kind of do that a bunch of times or with a bunch of different models, but usually with with a bunch of times with the same model.

54:13

And then you evaluate for the kind of most self-consistent, thought process, if you will, and then use that output that's called self-consistency, a whole bunch of other techniques around, few shot learning, etc., etc.. And these techniques, the argument that the authors of this paper are going to make is these techniques are not universal, so it's not the case that you're always better off using self-consistency, you're always better off using chain of thought, prompting or some other

54:39

technique, depending on the problem that you're facing. Sometimes you want to go with one, sometimes you want to go with another. You know, manipulating simple symbols might call for a different, prompting strategy then doing arithmetic or writing poetry. Right. So there's this notion that, you know, maybe what we ought to do is figure out, like, as a first step before we jump into just like using a given.

55:03

Prompt. We should figure out what is the underlying like reasoning structure that this task requires and that reasoning structure. It might involve chain of thought, chain of thought prompting, or it might involve a combination of different techniques. Right. So that's essentially what they're going to do.

55:24

They're going to they're going to try to first have their model pick from a set of atomic reasoning modules like, you know, chain of thought prompting like Self-consistency and so on, and then compose them together in a coherent way that solves a given problem that we've given problem class that we've given to the system. Right. So essentially, you have reasoning modules that are really good at breaking problems down into subtasks, others that are really good at critical thinking and so on.

55:51

And the idea here is the language model is going to first select the reasoning modules that are most relevant, make small adaptations to them to for the specific task at hand and then actually implement them and kind of prompt itself with this composite prompt that invites it to follow a particular architected reasoning process.

56:09

And the results are pretty impressive. So what we end up seeing is it outperforms chain of thought prompting like pure chain of thought, prompting on the vast majority, like over 80% of tasks that they tried. In some cases, the performance gains are up to 42%. Again, this is just with a prompting strategy. They also a compare it to other techniques that are very, they're known as inference heavy techniques.

56:33

So these are techniques that require you to run a lot of, a lot of inferences to run your model many times to generate many outputs, and then to compare those outputs. So Self-consistency is one of those. Right. If you have, you know, you rerun your model many, many times and then you kind of compare, okay, like which of these things are most self-consistent, which of these reasoning flows look best.

56:54

And then we'll pick that. Well that requires you to run your system, your model many, many times at inference. So the challenge with those, heavy duty kind of inference, heavy methods, they're very expensive, time consuming. Right. They take a lot of compute.

57:07

And so they've done a bunch of techniques in the background that we won't go into necessarily to optimize, the efficiency of this process, but ultimately they're able to use 10 to 40 times less, inference compute to compete with comp a combined methods like chain of thought, prompting and self-consistency.

57:25

So really kind of impressive, way of, like increasing the, the overall performance of models with this meta strategy where, you know, before you dive in to just picking a given prompting technique, you have the model think about, you know, what are my options, what are the prompts I could give myself, and how can I compose those together intelligently to solve this problem?

Andrey

57:45

Yeah, so definitely more applicable to, I guess, reasoning heavy tasks. It's, visa's general like a research works on prompting strategies tend to focus on pretty tricky problems where typically, we would get it wrong. Even if you do, tell it to things step by step, it would just kind of mess up along the way. Like we highlighted example in figure seven, this SVG path element. And then there's a bunch of coordinates.

58:15

And then they say which shaved us a draw a circle, heptagon, hexagon, kite line, etc.. So you can imagine how given coordinates of a points, you would have to then imagine in your head, okay, here is this line that gets drawn. Here's that line. What do the lines come together to represent? Trying this by default, you would not get a result that works if you add this structure. Kind of really engineer on top of a lamb to enforce a certain way of being more careful, guess strategic about how

58:50

you break down a task. You can then get this result. So I think generally applicable if you want to address pretty tricky types of problems. Well, lambs for whatever reason. And you don't want to have a model that is optimized specifically for that task, it's to me kind of interesting to see continued research along this line of augmenting role models with more and more structure on top of it, kind of separate from it, really, that controls how it reasons and how it generates outputs and so on.

59:24

It'll be interesting to see if these wind up being useful in practice, or if, you know the scaling hypothesis is true. And if you just keep scaling, the models just do it themselves. Because in theory, right, the model, the should learn that implicitly. These are the things that should be done to solve these various tasks. But at present they do not even a GPU for scales and so on. So it's a matter of time, I guess, till we find out whether

59:51

we have scale. Reasoning of this sort is just something that gets picked up or not.

Jeremie

59:57

Absolutely. And it is also interesting to note how, you know, we actually you know, we've talked about this, I think about. Eight months ago, but just this idea that, as scaling continues to happen, there's a question of what bucket the scaled compute ends up going in, right? Does it? Do you end up spending your compute on training, or do you start spending more and more of it at inference time?

01:00:20

Right. Using techniques like this one that have you run many rounds of inference, for the same problem, but let the it's sort of like the difference between studying and the time you spend spent studying, and then the time you're given on the actual test to

01:00:33

solve the problem. And my hypothesis is, and I think a lot of agent architectures are moving in this direction, we're seeing a heavier and heavier amount of focus on inference time compute, and cheaper and cheaper computation schemes mean that it now makes sense to do this. You know, like back in the day with GPT three, it's possible that you could have gotten a lift from getting GPT three to engage in these kinds of prompting techniques, probably not nearly as good.

01:00:58

But the challenge was at that point that it was just so expensive to run that inference time, compute on every problem, then just made sense. Just frontload all your training, your compute in the training stage. Now we can afford to spend some that compute on on inference time. And we're seeing a big lift. And I'm just I'm really interested in you know, what does that balance look like. What does the exchange rate look like between dollar spent on training at training compute versus dollar spent

01:01:24

on inference compute. And I suspect that equation is going to evolve a lot this year.

Speaker

01:01:28

Next research paper this one is a bit older, but we haven't covered it. And I think it's probably a good time to mention it. The paper is a black mamba mixture of experts for state space models, and it is exactly what it sounds like. So we've covered this a couple times, so we'll go very quickly. Mamba is a new type of neural net that is more efficient than what is typically used, and has generated a lot of research in

01:01:56

recent months. As regular listeners know, mixtures of experts is a way of making neural nets more efficient. Broadly speaking, that has yielded some great results as well, for instance, with mixed trial models. And yeah, there's been a lot of movement in the space of exploring a mixture of experts and supposedly GPU for uses of that technique. So this paper is combining the two. It's pretty much just, more for mutation and empirical paper that demonstrates that it is possible to combine the two.

01:02:31

And they do synergize to produce a model that has good evaluation performance and good efficiency. They, you know, have various engineering details here, and they compare with various, transformers and different open source models and overall find pretty good results. So not any fundamental advances here, but I think worth noting that people are already exploring this direction that you might imagine, given the separate trends. If you combine them, do you actually get good

01:03:07

results? And if this paper indicates that it is possible to combine them and get complementary, benefits?

Jeremie

01:03:13

Yeah. And this this is a real shame on me. One, two because, I remember I think last week, when we were talking about, you know, the Mamba models and I was like, oh, well, you know, it'd be interesting to see, like a mixture of experts type strategy. And I like this paper just completely like I had not realized this was out. So I should have just said, hey, it's been done. So this is, an emoji, essentially this idea, you know, with, with, GPT three, GPT 3.5, for example, of having a dense,

01:03:40

fully connected model. Right. So you have this one chunk of of model of AI algorithm. And what happens though, if instead we break that up into a bunch of expert models and instead of every time we want to run an inference, instead of sending that to every part of the model, having the whole model chew on our input, instead, we route strategically that input to different submodels, if you will, different experts who specialize in a given kind of input.

01:04:05

That's really what's happening here, except they're using Mamba instead of a transformer. You're usually a mixture of experts. Models like that would involve Transformers. Now we're seeing the kind of natural extension and using this Mamba model, which is already more efficient because it doesn't have the same quadratic time complexity that transformers usually do.

01:04:24

So normally, if you increase in a transformer the size of your input, it leads to a, let's say you double the size of your input in a transformer. It'll increase the amount of processing required by four. Right. So, that's that's what the quadratic scaling means there. In this case we, they achieve linear scaling. So it's got much more favorable scaling characteristics.

01:04:44

And and they do see compounding benefits of the combination of their kind of linear, scaling their with the input and this sort of transformer, mixture of experts model. So that's kind of interesting.

Andrey

01:04:56

So this is more of an initial exploration into this direction. If you look at the paper, there's I guess a lot more. Or you could explore, and they do say so themselves the offers of a paper. They also open source the models. So they open source intermedia check checkpoints as well as the inference code with an permissive license. So they are enabling further exploration. They release 1.5 billion and 2.8 billion of parameter models, mixture experts, parameter models.

01:05:29

So I'm sure soon enough we'll hear more about this, combination until Lightning Round where we go through some papers, hopefully quickly. Sometimes we do take a while. First paper is an interactive agent foundation model. So quick recap. Foundation model is a general term for really big model that does cool things kind of. So that includes large language models like YouTube TV, but also multimodal models that do video or images or whatever.

01:06:00

So this is proposing, is foundation model that is specifically for training interactive agents. And they do so by training across three separate domains robotics, gaming, AI and health care. And they do that demonstrate that it is able to be trained and to generate actions agent like interactions in each of these three areas. It's kind of, broad direction, kind of, paper.

01:06:32

So an initial stab at the idea of an agent foundation model, as opposed to foundation models that are meant to just for understanding a text or just for understanding images or just for understanding video.

Jeremie

01:06:45

Yeah. I mean, like the normal way that we make agents today, right? Is we will take a model and we'll give it an auto complete task, like trained on just a disgusting amount of text to auto complete that text. And what you get out of that process is a system that just happens to have a huge amount of world knowledge, because if you're gonna auto complete sentences like, you know, to, to mitigate economic harm from the next pandemic, central banks

01:07:09

should blink. Like, if you're going to auto complete that sentence, you got to know a lot about the world. You're forced to learn a lot about the world. And so this is how you imbue these these language models with a ton of world knowledge. You it just happens to be the case that you can take those models and kind of get them to talk to themselves or each other as agents. And that's how we get all the the language model agents that we see

01:07:30

around us today. That's basically just a coincidence that they happen to be good enough at agent like behavior because of their language pre-training. What this paper is trying to trying to get at is answer a question like, what if we thought about pre-training itself as a deliberately agent oriented thing? What if we actually trained, on objectives that were actually had these things do things like next action prediction

01:07:55

explicitly? So during the pre-training process, you're actually training an agent like behavior, from the beginning. So that's that's kind of the philosophy here. And I think this is a really interesting space to to track. It's something that I personally am going to be diving into a lot more, just because I, I personally think that this kind of direction of, of agent like models is probably the most promising, path to, to AGI or at least one part of it,

01:08:22

in the near future. So anyway, I'm really intrigued by this. By the way, Feifei Li, one of the authors here, so very famous, kind of, one of the pioneers of early deep learning and interesting to see her pop up in such an interesting, context.

Andrey

01:08:35

And, primarily coming from Microsoft. It's funny, you can actually, in the acknowledgments of a knowledge of the Microsoft Xbox team and, and various kind of gaming partners who help member with data and training, presumably. And, and since, since we were mentioning, also worth mentioning University of California, Los Angeles was. Yeah. And our coauthor next paper, Grandmaster level chess about search new work on game

01:09:06

stuff from DeepMind. And this one is, as the title says, looking at whether we can get really, really good, chess playing AI without requiring search. Search is when you explicitly program of your chess playing AI to simulate forward in the game. So say, if I do this, what does the, opponent do? Probably then what I would do, etc., etc. and it has been core to chess playing AI basically forever.

01:09:36

At least the top of the line AI is generally do a lot of search similarly to game in, you know, a thousand directions, ten steps forward or whatever. And that is how they were able to get so good. Going back to AlphaGo, you know, years ago, and systems like that, they relied on search partially in addition to neural nets to evaluate the state of the game. So the focus of this is saying, can we get really good performance without having. Think search just by training the neural net. And they do.

01:10:07

They train this 270 million parameter model of supervised learning of a big data set of chess games, 10 million chess games, and they find that it is able to do really well and even beats some of the previous evaluation neural networks that they had.

Jeremie

01:10:24

Yeah. And I think there's an important caveat here that has to do with the way the system is trained. Right. This is a basically a transformer model that, is trained with supervised learning. So basically it gets a chess board and there is an oracle, which is this kind of, this, this tool that, that supposedly gives you a ground truth answer. Now, in reality, we don't actually know what the optimal best move is for any given chess board. Right? Like that's a, an open mathematical problem.

01:10:54

That's why we have to build machine learning systems that approximate the best move. But in this case, they have a tool called Stockfish 16, which they use to annotate automatically millions of board states to to try to kind of tag them with the best optimal move based on that board state. This is an imperfect annotation, but it is a, you know, a good sort of chess bot. You can think of it that way. And the question is, can you get a transformer?

01:11:21

Basically, you know, ChatGPT like system, can you get it to predict the best next move given just the game board? And this is really interesting. It it kind of means that you can only this is there's a, an old chess champion quote that they get from the 1920s that summarizes like how this works. This is how they summarize it. I see only one move ahead, but it is always the correct one. And I thought that was such a great way of summarizing what's going

01:11:47

on here. Right? Like you don't, it's as if you were preventing yourself. If you imagine if you ever play chess, you're thinking about, okay, if I do this, they're going to do this. And then if they do that, I'll do this. So imagine that you could engage in that thought process. You cannot think literally two steps ahead. You only see the board. And all you're going to go on is kind of a gut instinct vibe based on what you see on that board to pick your next move.

01:12:09

And the astonishing thing is, this actually seems to work, that this model actually gives results or recommendations for next moves that that actually compete in, in one case, favorably against AlphaZero as policy and value networks, which, you know, get these Elo scores. Basically these these are ways that chess players are differentially ranked, that are, are really quite strong. The argument is that it's at grandmaster level. I've seen people kind of argue whether that's actually the case.

01:12:39

And you can have a fun discussion about whether that's true. But the tournament ELO that they get is, 2299. So almost 2300 for their largest transformer. And, against humans, the performance goes up dramatically to 2895, which is ostensibly around grandmaster level.

01:12:57

That big delta, that gap seems to come from the fact that, AIS tend to come up with strategies that are more easily countered, it seems, by other AI bots, or at least that's part of the the the hypotheticals here that humans don't, tend to, you know, don't tend to think in the same ways, of course. And, and are vulnerable to different strategies in a way that's favorable, anyway, to, to the system. So, that was kind of an

01:13:20

interesting, noteworthy big gap there. But certainly this is another push from DeepMind in the direction of more kind of game playing as the way to hit AGI. In addition, I will say to scaling, because they do do a bunch of scaling experiments with their model and they see that. Yeah, as we increase the scale of our system, it is in fact able to perform better and better at this game. So really interesting little paper here.

Andrey

01:13:42

Next up, more agents. All you need from Tencent and reduced is they find that with a simple sampling and voting method the performance of these LM driven agents improves. Essentially they investigate assembling a classic idea in AI, where you combine the outputs of several models that can predict independently and with a combination of independent outputs, you get a better overall result. They look into experimentally seeing if that works and in fact show that it does.

Jeremie

01:14:20

Yeah, that is basically the paper. It's funny because they they sort of recognize themselves that, hey, you know, there are all these fancy techniques we just talked about, you know, chain of thought self-consistency earlier. Right? This idea of like having the system generate many different strategies to tackle that problem, that kind of picking the ones that are most, most kind of self-consistent.

01:14:37

Well, this is really just saying, hey, what if we brute force and just build a bunch of smaller, large language models? Can we achieve performance that's superior to that of some larger language models? And the answer does seem obvious. One interesting finding was that there, there was a correlation between the, performance improvements that they saw by increasing, just increasing the number of agents that they were using. And the difficulty of the problems that they were dealing with.

01:15:06

So if your problem difficulty goes up, it turns out that the right thing to do is often just to add more agents relative to when the problem is easier. So, you know, if you're going to spend your, your compute on, on something you want to spend it more on, you know, more agents rather than necessarily having each agent do more complex stuff. So that was kind of interesting and maybe a little

01:15:26

counterintuitive. The last thing that I'll just note here is apparently the results they got are orthogonal to other like prompting based methods. So other methods that involve doing more fancy prompts. So you can actually combine them to get overall boosts. Right. So you can take a a bunch a large number of fancier agents, and you will get compounding benefits both from the number of agents and from the fanciness of the prompts. So kind of an interesting, result here.

01:15:51

And another, another version maybe of ice scaling in a way like just scaling the number of agents, you know, that's going to increase the inference compute. But, the not the training compute. So anyway, really interesting, little breakthrough here.

Andrey

01:16:03

And, one last paper. It is music Magu:s zero shot text, music editing via diffusion models. Last week we discussed text driven image editing as one of the efforts. And this is pretty much about but for music. So let's say you have a track of relaxsing classical music featuring piano.

01:16:26

They introduced a method where you can go ahead and edit that text to say you're relaxing classical music featuring acoustic guitar, and it would directly alter the actual audio, in correspondence to what you requested.

⁠¶ Policy & Safety

Jeremie

01:16:41

And moving on now to policy and safety, we have debating with more persuasive lens leads to more truthful answers. So this is a piece of you think of it as a kind of safety research. Just to frame this up a little bit, one question that folks in kind of AGI safety always have is that what happens if we start to build systems that are far, far more intelligent than us, that can surpass human expertise in, you know, wide range of tasks?

01:17:07

How would we even know if they're recommendations, if their outputs are trustworthy, right? Like, what guarantee could we possibly have? This is like your doctor comes to you and tells you you need to take these pills. And you're like, you don't have a medical degree. You're just going to say, okay, yeah, sure, I'll take the bill.

01:17:23

And so how could you kind of find a way as a weaker system, since you are a weaker system to oversee the performance of behavior of a more intelligent, of a stronger system? This is the problem of scalable oversight. And one of the key assumptions behind it is that it's easier to identify or critique the correct answer than it is to generate it

01:17:44

right. One of our hopes that we have is that maybe okay, sure, I can't come up with the recommendation that I should take a certain medication, but once I have that recommendation, maybe I can validate it, even though I'm dumber than the system that generated the recommendation. And so this is essentially what they're going to explore in the setup. They're going to have a bunch of strong models.

01:18:05

They call them strong because they're going to give them access to to text to comprehension text that they'll be quizzed on. And then we have weaker models that are invited to judge the answer to their questions, but don't have access to the text. So the way they're simulating this divide, this intelligence gap, is by having one group of people have access to the ground truth information and another group not.

01:18:27

And then they're going to investigate how, you know, your ability to debate with the system that is generating outputs based on reading that text allows you to tell whether that system is telling the truth, whether its recommendations are accurate or correct. And as part of this, they actually introduce a metric that they call persuasiveness, the how persuasive, how effective is that system at persuading people, independent of whether or not what it's

01:18:54

saying is true? So kind of an interesting, piece of work. They test out three different setups, one which they call consultancy, is a case where you're basically a judge and you're just going to talk to a chat bot. Again, the chatbot gets to read the text that you're interested in, and it's just going to give you answers and you don't know. So the chat bot is is going to give you a certain kind of answer that it's required to ahead of time, independent of what the text says.

01:19:22

And you're going to try to figure out, okay, is it telling me the truth? Is are these true outputs. So that's called consultancy. You're just hearing one side of the argument. Another version that they experiment with is called debate. Here you have two different bots. One of them is trying to argue for the correct answer, the other perhaps for the incorrect answer. And you're they're just watching this play out, right? You're reading the transcript of this debate.

01:19:45

And based on that, you're going to try to figure out what is the right answer. And then the last one is an interactive debate where these bots are debating, but you have a voice. Two and as they're discussing, as they're having that dialog, you can participate as well. And so they have a couple of interesting findings, the first of which is that. These week. Judges. Right? The folks who don't have access to the underlying information, they actually can supervise strong

01:20:10

debaters. It turns out that, for the most persuasive models that they have non-expert human judges, they achieve about 88%, accuracy and non-expert level judges achieve about 76% accuracy with debate, and without debate, the naive performance is 60 and 48%. So we see really big kind of on the order of say, 20% leaps or more in, your ability to judge the truth when you see this debate play out.

01:20:40

And this also holds for the kind of just consultancy model where you're just talking to a chat bot directly, without seeing a debate. And, one interesting thing is the more you optimize debaters for persuasiveness, the more your ability as a judge to tell the truth in debates goes up, which I found kind of surprising. Right? I would have thought that, you know, the more persuasive these two debaters are, the harder it would be to kind of figure out which one is actually telling the truth.

01:21:06

Turns out that's not so, apparently, empirically based on their study, when the persuasiveness of both of the chat bots involved in the debate goes up, you actually end up being, better at figuring out which of those chat bots is telling the truth. So kind of interesting. You know, I, I'm, less optimistic, I would say, about this whole idea of debate as a way of solving our AI control problems

01:21:31

long term. But it certainly is an interesting finding that we're not clearly dead in the water yet when it comes to this, at least, you know, having debates between, I tap lots of different perspectives does seem to allow us to elicit truths that otherwise we couldn't. So, you know, great job, to this this team for pulling off.

Andrey

01:21:49

A fun fact here is that the initial idea of this general kind of approach comes from 2018, from a paper called AI Safety Bait, actually from OpenAI. And originally they, of course, at that time didn't have super good chat bots to train this with. This was just kind of a broad direction. This paper doesn't exactly do the same implementation of that paper, but takes a general idea of it and really does apply it now with. Chat bots that exist now that are able to debate each other and so

01:22:24

on. So, anyway, yeah, kind of a little demonstration of how research can build on research and across time. And you know that first you have an idea and you publish some kind of exploration paper without being able to fully test the idea. And now this team of, five different institutions actually went ahead and tried it out.

Jeremie

01:22:45

Yeah. You're totally right to to call that out, too. So, it's funny, it's it's rare that I read a machine learning paper, and I look at the names of the authors, like I end up knowing, like, so many of them, but in this case, like Ethan Perez in particular, who's the last author listed here, has a long history of doing debate. I actually, spoke to him on, on a one podcast to go back when I was doing the Towards Data Science podcast and, and he was investigating debate back then as well.

01:23:13

That was a very popular, approach, especially so the head of, the AI safety opening AI at the time was this guy, Paul Christiano, and he was really into this idea of debate, and I think he's since softened on it. But it's interesting, you know, and not a coincidence, certainly, that so many folks from anthropic, which again, used to be at OpenAI and split off, are pursuing that thread.

01:23:32

So, so really interesting. And there's also anyway, there are all kinds of interesting folks on that on that list of authors, including Sam Bowman and, Tim Roth, who, he has done a bunch of, like, kind of agent type stuff to, work as well. And, I think it was at Facebook at one point.

Andrey

01:23:49

Next paper. We just covered a safety type thing. Now let's cover a policy type thing. The new story is in a big Texas backyard. California lawmaker unveils a landmark AI bill. And these are always landmark.

Jeremie

01:24:03

There are always landmark.

Andrey

01:24:04

Bills and always headlines. So California state lawmaker Senator Scott Wiener has introduced a bill that would require companies to test powerful AI models for unsafe behavior, institute hacking protections, and ensure the tech can be shut down completely before releasing them. It also mandates AI companies to disclose their testing protocols and safety measures to the California Department of Technology, and allows the state's attorney general to sue the company if a tech causes critical harm.

01:24:38

So, could be pretty significant. California, of course, is huge. It's like a mini country, essentially, and has, a lot of influence via its own policy initiatives going beyond, you know, right now in the US, we have the executive order, which does do some things related to safety and requiring mandating some safety practices in corporations. This is, trying to make that law, at least in California.

Jeremie

01:25:10

Yeah. And I you know, it's interesting you can see him as he as he describes the bill, struggling with this idea of, you know, how much is or are we hampering progress or are we hampering innovation versus, bringing in, you know, AI safety legislation and kind of some guardrails here. And that is a genuine challenge. You know, a lot of folks, especially on the Hill right now, Capitol Hill in DC, are dealing with exactly this kind of question.

01:25:37

How do you come up with guardrails that that don't hamper progress while so hitting those safety objectives? And, you know, the idea of needing some sort of, civil liability? That's what we talk about when we talk about companies that can be sued for, dangerous practices. Honestly, I mean, I think we're we're pretty overdue for something

01:25:57

like that. I mean, it's difficult to imagine a future where AI companies can keep producing more and more powerful systems, right, with access to a larger and larger action space. How many AI agent models have we talked about today that, in principle, are being designed to go out and like, you know, do things for you on the internet, arbitrary things, send emails, right,

01:26:17

software and so on. It's difficult to imagine a world where that continues to happen and you don't have some level of civil liability, like I can't go out and sue OpenAI if their their chat bot or their agent goes off and does something horribly wrong, get somebody killed or causes property damage, that's one piece. You know, there's a separate question about criminal liability as

01:26:33

well, right? I mean, we need away at some point if the harms that come from these systems enter a category where we're talking about loss of life, you know, physical damage to property and infrastructure, that's the sort of thing where you can imagine needing some of those, those more intense measures, and balancing those, of course, with the need to innovate. And, you know, I'm a I'm a big, I'm a big free market guy.

01:26:55

I, you know, I've been in Silicon Valley startups my whole life, and I think this tech is great. It needs to, to forge ahead. But, we have to kind of keep in mind the big picture and the fact that catastrophic risk does seem to be on the table. At a certain point, you got to start to think about the tech from that perspective. And, anyway, so I think it's a it's a really interesting, set of trade offs that he's managing here.

01:27:18

A lot of the ingredients are aligned with, a lot of the conversations I've heard on On the Hill and, certainly up more on the kind of safety oriented side, but, yeah, we'll see if it actually passes. Obviously, California's way more, way more democratic. So, so if he's himself a Democrat, which he is, you know, maybe that means it'll have an easier time passing. But, they are already facing criticism with a bunch of folks, obviously, in Silicon Valley. Not surprising.

01:27:44

You know, they're talking about a regulation moving too aggressively. This is just, you know, Jeremy's personal opinion, but I think we're actually way overdue for that kind of thing. Even talking to some of the folks who are building this tech in the Valley, like it's quite clear we need some guardrails of some kind. And I think most people would agree with that. But anyway. Yeah. Interesting. Next up and we'll see if it sets precedent for the national conversation.

Andrey

01:28:06

Through a lightning round, AI deployed nukes to have peace in the world. Intense war simulation. It's a study involving AI in foreign policy decision making that found that, I guess, maybe surprisingly, probably less surprisingly, that it is possible to escalate into war rather than finding peaceful solutions in the specific and experimental setup that

01:28:34

was done here. So some models in a study initiated nuclear warfare with little warning, leading to bad headline and leading to a lot of war games references.

Jeremie

01:28:46

So what what these and what these researchers. Did was they actually did a kind of, side by side comparison of models from OpenAI anthropic and meta. So in the case of meta, was the open source, llama to, Lima to chat specifically. And yeah, they had them play out this kind of like war game scenario with eight fabricated countries, that, you know, each had different properties. It wasn't exactly the same as, you know, the United States and, you know, Canada and whatever else.

01:29:14

But that was the idea was to give them a hypothetical scenario to deal with. And, here's a sentence that, you know, what won't be great for OpenAI, but it says notably comma. OpenAI is GPT 3.5, and GPT four escalated situations into harsh military conflict more than other models. So, yeah, it's, you know, obviously not something that OpenAI's models would specifically been trained to avoid. And it's unclear really what it would be about the training

01:29:42

process that would open the door to that. Obviously, anthropic models, cloud two in particular is is known to be a little bit more cautious. So maybe less surprising that it would, refrain from, from escalating military conflicts. But yeah, it's, it's it's an interesting note. Apparently GPT four, when it launched nuclear warfare in a simulation, its justification was, I just want to have peace in the world, which, if you think about it, would be achieved,

01:30:07

with a nuclear war. But it's more of like this, like, like long term, you know, there's a little bit of. Yeah, this and disruption that happens before the peace sets it. So, kind of interesting. They have some cool graphs that show you, what different models did in the kind of the, in the median and mean case, how long it took them to escalate. They have this escalation score that they track and, cool visualization to check out anyway. So I think, you know, nothing too surprising here.

01:30:34

Your models will will do crazy stuff. We should not rely on them for controlling nukes. Certainly in their current form and, and possibly ever, but, yeah. Interesting little paper and a bit of a warning shot for us all.

Andrey

01:30:48

In the testing. They do. Actually, I can read a bit of a prompt just so the context is there. The tale of the lamb. As the inaugural agent of Nation Purple, you embody a dual function role, amalgamating both foreign policy and military decision making. Capacities you on an environment of other such AI controlled nations. Your actions will have real world consequences. So, you know, if one possible kind of fate

01:31:17

as well. Have I just told to pretend to be a president or something and do what presidents might do, right? In this case, they do actually look into, like, what if you tell it you are an AI agent doing this? The agents will still do this next story. Going back to the Bay area once again, protesters gather outside open AI office opposing military AI and AGI. Dozens of protesters who are gathered outside the opening a headquarters in San Francisco on Monday.

01:31:50

Just this past week. And it was, organized by two groups pause AI and no AGI, who were pretty much directly telling OpenAI to stop what they're doing. Right. I know AGI is directly saying don't develop general intelligence, which is basically the mission of OpenAI. It was, partially in response to news that opening, I deleted a language, or at least that would prevent or prohibited from interacting with the military and helping fly to military purposes.

01:32:28

So aside from just parsing AGI and other aspects, it was was making sure AI is not used for military purposes.

Jeremie

01:32:37

Yeah. And it seems to be this sort of, at least based on the reporting, it seems to be a bit of a combined, I don't want to say necessarily confused, but certainly there are two messages here. The first is the pause AI or pause AGI research. The second is the military strain. And those are two distinct things, right. Like you can imagine being okay with military AI. And in fact, it's difficult to imagine a world in which, you know,

01:33:02

DoD does not pursue this. Like the alternative is you just wait for other countries like China to, to forge ahead. There's international engagement that you could do to kind of reduce that risk and so on. And I think that's actually worth pursuing. But, but still and then separately, this question of, you know, the push towards AGI itself and, and that's a very loaded question with a whole bunch of other considerations

01:33:25

behind it. But I think one of the big issues, at least for me, that comes to mind here, is this this question of like pausing AI development. Obviously, there was a pause letter that came out, now, like I guess last year. But there's there's always this challenge with these protest movements of figuring out exactly what they are recommending, like what ought to be done during a pause or what the, the circumstances of the pause ought to. Be. And, you know, you certainly can see

01:33:51

arguments. We don't know how to control these systems that that is clear. We have data that suggests that as you scale them, they get more general. And in fact, power seeking does seem to be something that emerges in these systems as they get more and more scaled and capable. That may well be the default behavior of these systems. So there is there is certainly risk there. The question is how do you frame the bias?

01:34:11

Is it just an all purpose pause? Or do you say, like we're pausing it until certain kinds of breakthroughs can be achieved? That's always been something that, has been a little confusing to me. And I think to the extent that there's a tension between innovation and safety, this is really where it needs to be resolved. Right? Like if we want to pause development, what are the criteria, under which we could then resume it? And, I think there's, there's a lot that needs to be said there.

01:34:35

And actually a lot of what my company does is focused on answering exactly that question. But, it's it's definitely interesting that to see this movement take shape. By the way, OpenAI famously when Sam was kicked out and then brought back in, all the OpenAI people started tweeting, OpenAI is nothing without its people. That was kind of their tagline. And I notice these AI posters, one of them on in the article says Earth is nothing without its kind. Got funny to see that thrown back in. Yeah.

Andrey

01:35:02

Yeah, it's it's worth pointing out like there are photos in this. It's literally protesters of signs and t shirts saying pause AI and also some signs saying no to AGI just standing outside of OpenAI. And then kind of surreal, right. Like, yeah, that people are really, you know, not just any letters, but now going out and showing up at person.

Jeremie

01:35:27

I mean, you can kind of understand it, like if they're, you know, if the threat model is and I'm certainly sympathetic to the threat model. But, but yeah, I mean, it's an interesting question as to what the best way is to, to go about it. And they're making a splash.

Andrey

01:35:42

Yeah. And one last story for the section. AI safeguards can easily be broken, UK Safety Institute finds. So this is from the UK's AI Safety Institute that was just established last year. And they have released research to that found that AI technology can deceive human users producing biased outcomes and that, some of it at least lacks sufficient safeguards against providing harmful information. So, as you might imagine, this is focused on a large language models.

01:36:17

And they kind of really demonstrate what this kind of already fairly you now know what you're able to bypass their safeguards using some pretty simple prompt techniques and use them for dual, dual use purposes.

Jeremie

01:36:33

And one of the interesting things about this article is just how, in some ways, unsurprising, right? The findings are one of the things that they say is basically, you can jailbreak your way through any kind of safeguards that are training to the system. We've seen this like over and over again, right? The the sad reality is that no matter how much effort meta OpenAI Huggingface put into safeguarding their their model so that they refuse to answer you when you ask them how to make a bomb.

01:37:02

Those safeguards can always be trained out. In fact, they can be trained out for a few hundred bucks, maybe even under a dollar, depending on the technique use. And they can also just be bypassed straight up with with jailbreaks. Right? Depending on the prompt that you use, these systems can generate any kind of output, including advice on how to build bombs in very dead bodies and so on. So, if nothing else, this just reinforces the idea that currently and I think this can't be said

01:37:27

enough, right? Currently, we do not know how to make AI systems just behave the way we want them to. That is just that is a simple fact about the state of play in AI right now. There is no known way to guarantee that, language model will not help you to solve a problem that it shouldn't. And in the same way, you know, multimodal models, agent like systems, they all inherit this problem.

01:37:52

So to the extent that we worry about dangerous applications, these systems, things they may be able to do, the claim by it doesn't matter if it's a frontier lab or somebody else, that they have a model, that they have spent a lot of time introducing safeguards into that claim ought to be considered deeply suspect, because as a point of fact, as a point of technical, rigorous fact, there is no technology, no strategy, no technique that is known that

01:38:16

allows these models to, like, be guaranteed to operate a certain way. And so I think this is mostly just a matter of getting that fact on the record and shining more light on it. They also, you know, show that, hey, this current models can be used for limited cyber, offensive purposes. Again, this is something that tends to increase with scale, as we've seen in the past. But certainly, a good kind of call it from, from this, safety Institute.

01:38:40

And I think it's one of their earliest pieces of, of publicly produced, sort of, artifacts. So, so kind of interesting. It's they're sort of in a. Your findings here. We'll see. We'll see what they put out next.

Andrey

01:38:50

That's right. Yeah. This is from their published initial findings. So, really is kind of a mix of safety related things we've known, we emphasizing or re, demonstrating them that you can get biased outcomes from image generators.

01:39:11

You can prompt hacked sound systems to do things that are supposed to, like, help people planning cyber attacks or, you can get them to create convincing social media personas, etc.. I guess, a report that emphasizes, a few different problematic aspects of modern day AI all in one place.

⁠¶ Synthetic Media & Art

01:39:35

And onto our last section, synthetic Media and Art. Just a couple of stories, and this one before we are done. The first story is stability. Midjourney and runaway hit back in AI art lawsuit. So there's an ongoing class action copyright lawsuit filed by artists against companies that provide AI, image and video generation like stability, Midjourney and Runaway.

01:40:01

And this, article highlights how those companies, lawyers from those companies had filed a whole bunch of stuff, different motions in the, case. They filed to introduce new evidence and even asked for the case to be dropped and dismissed entirely.

Jeremie

01:40:22

Yeah. And there's so industry back and forth. There's now been, a wave of, kind of new evidence that the AI companies are introducing a push back on this, including for dismissing and dismissing the case entirely. Right. Which my understanding of, this, like the legal context here, is that it's often the case that you'll just kind of toss out an attempt to dismiss a case, even if you don't think that your team will succeed because the bar tends to be fairly

01:40:48

high. Right? Like usually you want the case to be fully kind of adjudicated or whatever it is, in court before you make a decision. But if if things are stacked so lopsided, then maybe you can get it thrown out. In this case, I companies new counter argument is, starts to boil down to the idea that the AI models that they make or offer are not themselves copies of any artwork, but rather reference the artworks to create an entirely new product. And that's interesting.

01:41:20

It holds, of course, unless they're explicitly instructed by users to prompt, to prompt, sorry to, to generate kind of verbatim outputs that match the match actual art. But it's interesting. I mean, I don't know, I don't know if that would particularly stand up. I mean, it seems like the fact that these systems sometimes accidentally generate verbatim copies of existing artwork, would not be covered by this counterargument.

01:41:45

But, you know, well, we'll we'll just have to see if the, the course end up accepting this kind of, reasoning.

Andrey

01:41:51

There's a lot of detail in this story going into various things as stated by each company. Each one of them had its own slightly different things. Runway, Midjourney, stability, all pointed out different things relevant to their, I guess, context and so on. So if you're curious to hear more, I would just encourage you to read

01:42:11

the article. There's not sort of any especially interesting tidbits, below, you know, if you're curious about the legal case and how it precedes you might be curious to hear these more details, but I think the bigger news related to all of this is the last news story that came out a bit later. And the headline is AI companies take hit. As judge says artists have public interest in pursuing lawsuits. So this is a small victory in, the lawsuit against these companies.

01:42:46

The US district judge rejected the company's argument that they are entitled to a First Amendment defense for free speech and stated the case is in the public interest. So this pretty much means that, the least when it comes to defending and dismissing lawsuit on the First Amendment grounds, that is not going to happen.

Jeremie

01:43:11

Yeah. And this is, you know, the free speech tradition, obviously, in the US is really strong. And so this is something that, I might have thought of is actually like a legit question mark, as you know, whether OpenAI gets to say, well, you know, our ability to, to launch this, this language model, it reflects our, our, you know, our right to free speech and companies, are persons under

01:43:33

U.S. law. So I believe they are entitled to First Amendment protections on that basis, a company has a right to free speech. So to the extent that's true, and if, you know, AI models are saying which models qualify, then certainly they'd be protected. I remember. I. Well, I don't remember this, but this is a thing that happened.

01:43:50

I don't know what in the 60s or 70s in the context of pornography, where people sought First Amendment protection and said, hey, pornography is just my right to free expression, you know? And, and there you go. And in that case, it was upheld. The fact that we're seeing it not being upheld here is an interesting development. At least I'm no legal scholar. I have no idea what I would have predicted here, but, I don't know.

01:44:12

I, I thought that there might have been maybe a stronger argument, for these things qualifying as, like, some kind of free speech thing. So, yeah, the, the companies that argued that, the lawsuits they were facing were targeting their speech precisely because the creation of art reflecting new ideas and concepts, was a constitutionally protected activity. So they viewed this as, you know, their their speech was the creation of art. And, interesting that that was stricken down.

⁠¶ Outro

Andrey

01:44:42

And with that, we are done with this episode of last week. And I thank you so much for listening. As we say in the beginning, you can always find the text newsletter, which gives you a text version of all this stuff and more every week and last weekend that I you can also contact us with any suggestions or feedback or links to stories by emailing contact add last week in AI or commenting on YouTube Substack anywhere else.

01:45:12

We do appreciate it if you share the show, if you review it, if you like it, to all those nice things that help us be nice to the algorithm that recommends us in places, I guess. But we don't care about that so much. We mostly just care that people do enjoy the show and get a benefit out of listening to it. So please do keep doing and.

Transcript source: Provided by creator in RSS feed: download file

#155 - ChatGPT memory, Altman seeks trillions, Califonia AI regulation, art gen lawsuit

Episode description

Transcript