#170 - new Sora rival, OpenAI robotics, understanding GPT4, AGI by 2027? - podcast episode cover

#170 - new Sora rival, OpenAI robotics, understanding GPT4, AGI by 2027?

Jun 09, 20242 hr 49 minEp. 209
--:--
--:--
Listen in podcast apps:

Episode description

Our 170th episode with a summary and discussion of last week's big AI news!

With hosts Andrey Kurenkov (https://twitter.com/andrey_kurenkov) and Jeremie Harris (https://twitter.com/jeremiecharris)

Feel free to leave us feedback here.

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at [email protected] and/or [email protected]

Timestamps + Links:

Transcript

Andrey

Hello and welcome to the latest episode of Last Week in AI. We're going use chat about what's going on with AI. As usual. In this episode, we will summarize and discuss some of last week's most interesting AI news. And as usual, you can also check out our last week in AI newsletter at last week in AI for a whole bunch more articles we are not going to cover in this episode. Am one of your hosts, Andrey Kurenkov.

I finished my PhD focused on AI at Stanford last year and I now work at a generative AI startup.

Jeremie

And hi citizens of the internet, I am Jemire Harris. I'm one of the co-founders of Gladstone AI. It's neon, a national security company, and I just spent the last week in DC on Hill and elsewhere. And so Andre was kind enough to like, bump our usual recording slot. It's a Saturday. It's like 6 p.m. for you, Andre.

Andrey

Yeah, it's a little late, but you're going to get all the news this week that, crammed in. Usually there's like a few day gap between recording. Editing. This time is going to be like very short. So we're going to try and get all the news from this past week. So guess good for listeners.

Jeremie

Very fresh.

Andrey

And before we dive into the news, as usual, I want to spend just a bit of time calling out some feedback. We had one review on Apple Podcast drop from Radar Free 699, and the title of review is Less Eye Doom please, and the text review is less. I do please. So pretty clear cut.

Jeremie

What are you trying to say? I'm.

Andrey

Yeah, I don't know. And I will say I feel like we don't get into AI doom that much. Like we definitely talk about alignment a lot and concerns a lot. But those are even if you're not concerned about AI doom, I think alignment and safety are things that everyone should care about, even if you don't have an X risk kind of mindset. But thanks for the feedback.

Jeremie

No, all is appreciated. The critical feedback is always the best, right? That's how we wait. Hold on. Let me take that back. Say nice things. Everybody just say nice things.

Andrey

Yeah. That three star review did hurt our average. So if you want to, like, help out with the five stars, I don't know. And a couple others. I did get an email from Matthias, who has emailed us a couple times with some very nice comments on the episodes. Did mention that it's nice to have Jeremy back as a lively co-host. So yeah, there you go. I'm glad people are glad about it.

Jeremie

Oh, shucks. Matthias.

Andrey

Yeah, and just one more round of acknowledgments. We did get a bunch of comments on YouTube where we are now posting the recordings of this with our faces. So if you somehow are interested in that, you can go there. I might even start including more kind of screenshots and whatnot. Couple nice helpful comments there saying it's hard to find our names because we're not in the descriptions of the episodes. Okay, it was going to be right now, including also a link

to our Twitters. If you somehow want to follow and see all the AI memes I retweet every day. Yeah. And a lot of comments on there. So thank you for the YouTube audience. And if you do like your podcast on YouTube, feel free to go over and subscribe and like and all that sort of stuff.

Jeremie

Amazing. Thank you. YouTube comments I never thought I'd say that, but yeah.

Andrey

Right, and starting with the Tools and Apps section of a first news story is calling is your latest AI video generator that could rival open AI's saw services coming from very Chinese tech company Kwai Show. And yes, there's a new AI video generation model that produces videos up to two minutes long at HD resolution and 30 frames per second. And just from looking at them seems to be the closest to Sora that we've observed so

far. There are a lot of other competitors out there like PCA in the US, but with videos we generate are still not quite as high resolution or smooth looking. And visual compression in response to this has seemed to be that this is sort of one of the most impressive outings, and this is now, available as a public demo in, China. So, yeah, exciting progress in video generation.

Jeremie

Yeah, well, you know, the fact that it's a public demo in China also makes it very different from Sora, right? Which is still very much under wraps. And that makes it really hard to get a good head to head comparison. This is something a lot of people have commented on. There was an article we talked about on the podcast before about just the difficulty in practice of actually using Sora to make those videos that showed up on OpenAI's, you know, launch website and all the press.

It's unclear how good Swor really is, and ironically, we might actually have a better sense of how this model is just because it is more publicly available. We don't know much about it. Right? So I went on their website to kind of see what information I could gather. They do it is, by the way, all in all in Chinese. So I had to hit Google Translate. But apparently Cujo has, which is the company behind it, has set up this thing that Google Translate thinks it's called the big model team.

Maybe that's large model team, but basically you get the idea. They see a lot of things there about like how custom this architecture is really. It seems to be they're cleaning something that they themselves have designed. So when you read the description of it, there are a lot of themes that come up that do sound like at least they they rhyme with what? Open the eyes, Sora can do. You know, they're using a diffusion transformer.

They talk about this whole space time encoding that they're doing, again, very similar to what we heard in the context of Sora. So? So it certainly inspired related to technically the Sora model, but they do claim that this is a new kind of special architecture that they've set up. Worth noting, too, about the company itself. This is so the model, of course, is called cling. Or you might prefer calling it clean.

Andrey

Oops.

Jeremie

Sorry. That was like quite show itself, though is a company that a lot of people haven't heard of in the West. It is basically a competitor to to TikTok, essentially like Chinese TikTok. And as a result they do these short videos. So in by the way, in China, TikTok is actually called Douyin. So that's sort of their immediate competitor. They have a mobile app. It's all about sharing short videos and all that stuff. And so it makes sense that they're diving into this space.

They have gotten a lot of funding from the China Internet Investment Fund, which the state. It's a state owned enterprises controlled by the Cyberspace Administration of China. The important thing here is that they actually have with no as a golden share ownership stake in the company and quite show golden share ownership, is this thing that

the Chinese state uses. Basically, there are special shares that allow them to exercise, voting control to outvote all other shares, of other shareholders under certain circumstances. So in China, this is often used for the state to exercise de facto control over an entity. I mentioned not to say that what we're looking at, you know, we talk about quite a show. You really are talking about a company that is under an unusually high degree of state control.

So kind of worth noting when you think about this kind of very new and important player in the generative AI space. This is a company that has a pretty significant stateside footprint.

Andrey

Right. And I'm going to try and just show off some examples. We'll see if it works. But just looking at the videos that they have shared or the some of them. My observation is it's not quite as good as it's pretty obviously AI generated. So you see a lot of artifacts and whatnot. Still quite good, especially in terms of the consistency part where it seems sort of physically realistic. There's less noise, so definitely impressive.

But it's still the case that nothing out there has been quite as good as Sora. So OpenAI really ahead of a game on just about everybody still, and I'm sure it is just a matter of time, but somehow no one has quite managed it yet.

Jeremie

It's true. Yeah, it is always hard to tell how you know how cherry picked again, the saw videos are because it's a little bit more proprietary. But you're right, it does seem to be kind of lower quality. They did show, you know, some of those examples of, you know, a knife going through some fruit or something and actually cutting it. Right. And the effect of the cuts staying in the video. So people have used that to argue that it is in fact capturing some something.

You know, that through a physical simulation or in it that we heard with Sora, you know, is this model actually able to simulate physics that, you know, there's some stickiness, that argument here, for sure, but it does seem to be early days. It also may improve quickly. Right? So one of the things that they highlight is it's a self-developed model architecture. That's what they claim on the website. They claim they also have their own scaling law that they've developed for

this model. Right. So this might suggest that, you know, more scale, more compute, more data. Maybe we can see this start to reach more levels. If if the company can actually get their hands on this processor is which of course is a challenge given there in China.

Andrey

Next up, Apple Intelligence will automatically choose between on device and cloud powered a. So we have known for a while that Apple is going to talk a whole bunch about AI at an upcoming event at Wwdc 2024, and we have started to hear a preview of what's coming from various sources. So can't really for sure know this is happening. But most likely these summaries are probably going to include a lot of what's there. And this one is going to have.

Apparently this new AI systems of theirs will be called Apple Intelligence, and it will be integrated across the iPhone, iPad and Mac devices. It will provide better AI features and will have a chat YubiKey like chat bot powered by OpenAI. And it'll be focused more on that sort of functionality or less on image or video generation. It sounds like there's also going to be a lot of details with regards to privacy and security. So it will not be building user profiles based on what you do with it.

Somewhat in contrast to Microsoft, which had their whole recall. Thing recently. So yeah, are starting to see Apple coming out with a new kind of say, a different strategy compared to its competitors. And we'll be sure to cover it in more detail when this event does happen.

Jeremie

Yeah, apparently. So there's a Bloomberg report as well earlier saying that Apple is not planning on forcing users to use these new AI features. It's going to be opt in, which is in contrast to a lot of the AI features we've seen in the past in, you know, on different platforms where, you know, you want to default people to, to using it. So that'll be kind of interesting. One of the big questions obviously is you're at Apple,

right. So we're talking about a product with a brand that's built around security. So to the extent that you're planning on sending, you know, data off device to be processed on a server somewhere, what this is talking about that raises some

security concerns. And there have been reports that say that, you know, Apple may choose to focus on using their own M2 chips in data centers that have a secure enclave, basically, just to make sure that the data that's processed remotely is as secure as it would be on device. That the argument that, you know, they may try to make. So anyway, this is all part of that, that calculus, right. How much do you do on device versus how much do you

offload. There's for the moment and this may pass, you know, as algorithms get more efficient. But for the moment, that is the language in which the tradeoff between user experience and security is being expressed. Right. You can either have, you know, the the super secure on device computations happen, will be a little bit slower and more constrained, or you can have faster responses, you know, lower latency, but, you know, use a server somewhere where you have to send your data.

So that is kind of the balance Apple going to have to strike. And we'll we'll see what they end up choosing. It'll tell us a lot about what it means to do secure AI. You know, for a brand like Apple, they certainly are going to lead the way with that.

Andrey

Until Lightning Round. First story Udall introduces new studio 130 music Generation model and more advanced features, so the audio is one of these text to music services. In fact, it is the one we use to make our end of episode song every week. And I do quite like it.

And usually they offer their ability to generate 32nd chunks of music based on a text description and then ask them, here is about this new model will be able to generate two minutes of audio and will also be better, more coherent and structured in. Along with this, there are also some new advanced controls like random seed, the cue words, lyrics, intensity, a whole bunch of stuff that we're adding to have more control over the generation. So yeah, I think these. Types of text.

Music tools are advancing pretty rapidly, and this is currently only available to pro subscribers, but apparently it is going to be rolled out more widely soon.

Jeremie

Yeah, I really I find it so interesting to see all these new setting, you know, are being introduced, these new affordances, like how what are the knobs that you need in order to tweak your interaction with some of these AI powered tools? The random seed one is really interesting, right? Because, you know, if you actually build the AI system, you see this all the time.

It's, you know, how do you make sure that you have the ability to, although you're generating, you know, outputs with a certain amount of randomness baked in, how do you make sure you can reproduce a specific output, maybe that you generated earlier or a specific aspect of it? That's this random seed setting. It allows you to basically make the generation process a little, a little

bit more repeatable. So essentially, if there are some characteristics that you really liked about a song, you know, maybe I don't know the beat or, or the lyrics or something that you want to preserve into your next generation, even if you know other things like the lyrics might change, then you can do that. That gives you a little bit more control.

So it's kind of interesting. We're seeing this explosion of different, you know, knobs and buttons that we want to put in a user's hand so that their interaction with these tools have the affordances that users will turn out to really like. So anyway, I think part of that ongoing journey that we're all the we're all on together as we discover what generative AI user experience really need to look like.

Andrey

Next up, perplexity. AI's new feature will turn your searches into shareable pages. So this new feature is Perplexity Pages, and it will create well formatted web pages for their search queries. Flex. Hi, in case you don't know, is a company that is essentially AI powered search. You can query something. The AI or the tool will find a bunch of relevant web pages and use a large language model to summarize

them. And previously just sort of output ChatGPT ask answer to you, and this feature will let you output sort of like a Wikipedia page kind of thing. It will be much more esthetically pleasing and you can then share it. So it seems to be like going more towards the idea that you will use this to generate reports or things that you'll share with others, which, they already have to some extent. You can like share a link to your search to ours, but moving it more in that direction.

Jeremie

Yeah, it definitely feels like one, one strategic direction this could lead to is kind of, you know, generative internet where you're you're a little bit further away from the actual your web page content that's organically been written by human beings, at least for the most part. So it kind of, you know, it makes me wonder strategically, how precarious is the position of the average, you know, blogger or, you know, data source for a website on the internet.

If there's now going to be an intermediary layer of, you know, generative AI that just takes that treated as raw material and converts it into something that is, you know, more specifically designed for particular use or query. No idea where that goes, obviously, but that's a really interesting direction for perplexity to be pushing in. And, you know, we think about what it's going to take to actually compete with Google, which is perplexity game

plan. Yeah, that's a kind of interesting and plausible direction. I guess we'll have to see how that plays out.

Andrey

Next Level Labs AI generator makes explosions or other sound effects with just a prompt. So Novum Labs, the company that has so far focused primarily on voice synthesis from text, now has a new AI tool for sound effects. It can generate up to 22 seconds of sounds, and a vacuum can be combined on their platform with voices or actually

apparently music. And it sounds like they collaborated with Shutterstock to build the library and train on their audio clips, meaning that this perhaps is a bit safer to use than other, things that haven't used necessarily license data. This tool is free to use, but you do have to pay for commercial license to, be able to use this and of course, have various perks.

Jeremie

Yeah, it's definitely not the first. But, you know, more noise, more noise in the space, more noise in the space of noise. There are a lot more companies working on soul text. The sound space. Right. We had. Stability right stability released stable stability and know stable audio. Yeah stable audio last year and yeah especially that's about audio clips for music and sound of that. And then there was also an that is audio craft which can generate sound like background noises

and things like that. So definitely a lot of a lot of movement in this space, in this noise space. And yeah, I mean kind of interesting strategic positioning to again, you know, we're back to that whole question of the generative AI internet of the generative internet. We think about, you know, Shutterstock and like, yeah, what is the business model there?

If you're just a source of data, you know, eventually they're going to have to try to change their positioning because it does make them vulnerable to being kind of turned into just like the back, the back end, if you will, for a rapper that potentially could, you know, monetize more effectively.

Andrey

Notebook l m expense to India, UK and over 200 other countries. This platform is kind of a way to use their Gemini Pro chat, but it now supports a whole bunch of interface languages, 108 of them. And also pretty languages for sources and chat. What this notebook lamb thing is, is sort of more of an a chat bot. It is kind of an integrated UI to generate summaries of documents and work with things you upload to it.

I think, I guess you could think of it as being useful for homework, being useful for business analysis, stuff like that. So maybe less well known tool by game, but indicative that they're continuing to expand access to even this kind of stuff that isn't necessarily a big deal. And onto applications and business. And he began yet again with OpenAI, who seems to be front of this section every single week. But what can you do this time? The story is about OpenAI is restarting its robotics

research group. So back in the day before GPT three, before. You know, even GPT one. They. At OpenAI, I used to work on reinforcement learning. I used to work on robotics as sort of a two primary initiatives there. So in the first two years I worked a lot on video game on, I think it was Dota and getting human expert performance where via self reinforcement learning. And I also did some work on

robotics. Notably, they had this paper on solving a Rubik's cube with a humanoid arm and training that the, that kind of self bootstrap data collection, that sort of whole side of a company was shut down in 2021, and that was just a bit after GPT three came out. And they pretty much completely focused in on the lamb direction. But it does sound like they are starting to work on restarting that part of a company. They are now job posting for the search, robotics engineer and other job posting.

It says that are looking for someone capable of training multimodal robotics models to unlock new capabilities for our partners, robots and some other stuff like research and develop improvements to our core models and so on. So definitely kind of initial efforts there, but perhaps not surprising given the overall trend of a lot of work in robotics lately.

Jeremie

Yeah. No. Absolutely right. That it's unsurprising. It is, you know, very noteworthy how much Traction Robotics has gotten, especially recently. And OpenAI is actually been kind of right in the middle of this. Right? So their their in-house startup fund is invested in a bunch of companies that are making a lot of progress. We've talked about these investments where we have figure AI that raised, almost $750

million. So it's one X technologies, another one I don't think we've talked about, which is physical intelligence, but, you know, they've invested in a lot of these, these companies. And when they say you're here to support our partners, you can imagine that's going to be a big part of what they're talking about, right? The companies that the in-house startup fund invested in, we do know that this is a new hire because according to the job posting, it's going to be four quotes.

One of the first members of the team, we know from apparently Forbes that, somebody familiar with the matter has said that the group has only existed for about two months. So certainly very early on, there's been a bunch of hinting at the possibility of this robotics reboot, as the article puts it. You know, there was a press release in figures latest fundraise kind of hinted at this. And anyway, a bunch of a bunch of a bunch of hinting and, sort of gray signals from OpenAI about this.

There's. Yeah, right now a bit of an unclear kitty. There's a bit of murkiness around whether OpenAI is actually planning on developing their own robotics hardware. Right. That's really hard. That's a really big challenge. You know, getting getting your actuators set up, getting all the sensors, getting all that. They had struggled with that before. So it's possible that I'm just going to lean into the modeling side of that and get the interfaces.

We'll see. But ultimately it may also be positioned to compete with some of its own partners. We've certainly seen them do that in the past. Right. As for example, you know, ChatGPT and in other models start to increase their capability surfaces and gobble up what used to be startups that were just wrappers around the weaker versions of ChatGPT. You know, that may happen with robotics platforms.

You know, it's hard to know. But certainly this suggests OpenAI is like looking very, very closely at this whole space. And they're poaching, by the way, they're competing for the same small pool of talent that a lot of their partners are competing for. So at least in that sense, you know, it's not strict collaboration. There's a little bit of potential for friction there too.

Andrey

Yeah. It does seem like we have a bit of a hand here as far as their strategy. In particular, Forbes did say we have two sources that told them that OpenAI, it tends to coexist rather than compete against, companies that build the hardware, especially their partner companies, and that their intent to build technology that the robot makers will integrate into their own systems.

And that seems to be collaborated, or confirmed by a job posting that says that people hired for a position would be tasked with collaborating with external partners, as well as training AI models. And this coming a few months after it was an announcement, particularly not just for investment, but, figure partnering with OpenAI. And the announcement there was that figure would use ChatGPT and and open as models to, to be part of intelligence of their humanoid robot.

So to me, this sounds like pretty much kind of hiring some talent to strengthen that partnership and enable it, rather than really going back to what we are doing as far as doing some research with in-house hardware and so on. Next Saudi fund. Invest in China effort to create rival to open a. This is about being an investor in GPU. I believe we've already covered that. And they very much do aim to build an open AI rival.

And according to this report, prosperity seven, which is part of the state owned, state owned oil group Saudi Aramco, is participating in a funding round for that company as a minority investor. So, you know, not necessarily a major owner of us, but they are investing there. And the GPU is already set to be the largest generative AI startup in China by staff numbers, but has mainly been relying on local investment of government support.

So yeah, I think definitely you have more understanding of implications here, Jeremy. But I mean, I imagine given the US-China tensions, this is this will have some implications.

Jeremie

Oh yeah. Definitely. No, you're absolutely right. I mean, so one of the things they highlight is that this is the only foreign investor in the country's efforts to create a homegrown rival to OpenAI, which is in this case Japan, with which is the company that's being funded. So really, this is the first time China's been able to wrangle an investor to do this.

Why is this the first time? Well, one reason is that the U.S. has made it very clear that they don't want sort of sovereign wealth funds and, you know, other, other venture funds from any part of the world, really, that they can prevent from doing this to invest in Chinese

AI company. Yeah, we saw the tension, for example, with a lot of the other companies to sort of like G40 to stuff around, you know, whether or not to base servers in in the United Arab Emirates and all that stuff in the pressure the U.S. has been applying there. Partly, in fact, that had a connection to China as well, because there was a concern over, over possible ties there. So in this case, I think that's what you're seeing of China's sort of been forced to rely on domestic

investment. And that's taking the form in many cases of the nation state apparatus directly investing through various fund. So this is really an injection of fresh capital, kind of an interesting alignment here between Saudi Arabia and China. One of the things, though, that the article does highlight is, you know, the Saudis are certainly feeling Washington's pressure as well.

So the chief executive officer of of all at who is involved in this deal, says that, look, I would divest from China if I was forced to do so, making really clear like, look, we have to find a way to partner with Washington. And the signal is pretty clear. You know, if the hammer comes down, if there's enough pressure from Washington, even deals like this could fall through. And that's quite important because, you know, we've seen already the whole ecosystem around that, around

Japan in particular. There are other kind of competitors to them. In China, companies like moonshot, AI, minimax, 0.1 AI, which we've we've talked about their, series of models before. These are really important companies. They've been dependent on government funds and, well, in large local cloud providers as well, companies like Huawei to fund their growth because there's no external capital pouring in.

You know, you've got companies like or investors rather like SoftBank and Tiger Global that, you know, in other countries like India would come in and make those investments, but they just haven't. They've been sitting on the sidelines through this push again because of all that pressure. So, you know, this is, I think, a sign of, frankly, the success of the US policy in dealing with China to date. Kind of frustrating those attempts to raise money other than buy through domestic means.

And yeah, you know, we'll we'll have to see. But one of the interesting things too, is the valuation here is, I believe, is Yifu GPU being valued at about $3 billion. So, you know, that's yeah, starting to get pretty pretty decent at least gets a $400 million investment. Hard to buy that much if hardware. You know if the play here is to be the Chinese OpenAI. Look, OpenAI is sitting on investment sums on the order of like $20 billion.

You know, 400 million is not even, you know, scratching the surface of that kind of investment in a world where scaling dominates, which seems to be the world that we're in. Yeah, this is just really tough. This is a successful, stifling, as far as I can tell of, you know, domestic efforts in China to push their identity. AI research agenda.

Andrey

And on to the Lightning round. We have kind of a related story. First up, UAE seeks marriage with us over artificial intelligence deals and marriages in quotes, which is coming from the Financial Times. And it is based on some quotes from Umar Sultan al-Ahmar. Varghese I minister, you crushed that, You know, got I got just got it going with confidence.

And so in this interview, this person said that there's a deal in the works in which Microsoft purchased at 1.5 billion at stake in the Abu Dhabi based AI firm G 42. And this would be one of many collaborations between the EU and the US. There's this quote. Now you have going to see the outcomes of that marriage, if I may use that word between both of you, 42 on Microsoft, but also the UAE and the United States. So yeah, very much related story to what we just said with China.

I guess people are picking sides, you could say.

Jeremie

Yeah. And some people are being forced to pick sides. Look, this this seems strategically in some sense good for the US with a pretty important asterisk. Right. So in the context of this whole ugly situation, the one thing that during the article and worth flagging, you know, the investment vehicle that the US created here that's going to be worth billions of dollars. It's called MG. And they've been in talks with OpenAI about the chip development plans that OpenAI has.

There have been, there's been word of, you know, Sam Altman going over to the UAE also to talk about, you know, can we build the next sort of giant data center? And frankly, these are the data centers that when you talk to folks at OpenAI, a lot of them are starting to talk as if the current data center builds, which are the ones that are going to be supporting, let's say that 20, 27, 28 training runs. These data center builds internally are believed by many to be the AGI data center.

Like they are talking as if, you know, increasingly the story is coming into focus. We're starting to know you'll get a sense of like what the actual infrastructure will be, what the data centers will be that actually train the AGI run. Obviously, they could be wrong about that, but that is an internal belief in many quarters of OpenAI. Why do I say that? I say that because.

To the extent that we're planning on basing some of these data centers in the UAE for a training run of that magnitude, even if you know they're completely off about their predictions, certainly these are going to be models that are hugely capable. These are national security

assets that we're talking about. And so, you know, you really want to think about do you want those training runs to happen there, or is there a US national security imperative to try to ensure that if you're going to do that, then what you need to do is find a way to deregulate nuclear and natural gas power. And there's a great anyway, we'll get into it later. But there is a great blog post about this published earlier this week by a former OpenAI guy.

This is really the thing that's bottlenecking US AI development right now. We don't have the energy infrastructure to build the next data center or what we do, but we're getting more and more bottlenecked by that. And that's why we're having to look at other jurisdiction. That's really bad from a national security standpoint. So deregulation of of nuclear and other forms of energy like natural gas, really, really important that to kind of protect that national security imperative.

Andrey

Next. Zoox to test self-driving cars in Austin and Miami. Zoox is a self-driving car company. They've been around for a very long time, and for a while now have been owned by Amazon, and they're announcing plans to begin testing in Austin and Miami, in addition to the existing test cities of Las Vegas, San Francisco and Seattle. And Zoox is a bit different, from some of the other companies cruise and, Waymo.

They have a purpose built robotaxi, which is a vehicle with no steering wheel or pedals, and this side doors that side open, but that is not fully tested. They have a retrofitted kind of normal car that will be tested with safety drivers on board. So we could be, you know, kind of a fair deal behind. So police and Waymo. And so in terms of try to push things out there. But to me this is still very interesting. Just because Waymo is already offering this as a commercial

service. We are expanding to L.A., and try to expand kind of seemingly pretty rapidly. Cruise is seeking to come back and start offering things probably again in the near term. So this whole robotaxi, you know, area seems to be. Likely to be really hitting the ground in the next year or two. To me, like, I think we'll start seeing robotaxis a lot more in the coming year or two.

Jeremie

Andre, are you trying to say that the rubber is going to start to meet the road?

Andrey

I think so. I think that is an appropriate way to describe it.

Jeremie

Am I to think it's an appropriate way? This guy, he thought. Sorry. That's terrible. Yeah. No, it seems so interesting. I'm always sort of amused by it. Like, you know, we try to follow the space really closely. The podcast, among other reasons, and mostly, at least for me, if the other reasons, I had not been tracking or their progress. So it's really interesting to see there's yet another player in the space.

They do seem to have a like a testing protocol that apparently is somewhat distinctive. You know, they're saying that they they first start by looking at some specifically preplanned route. They're designed to be especially challenging, like challenging scenarios come up a lot driving features. And then they also do some random testing of point to point route within a certain geofence.

And so they're the rollout is going to start with the kind of focus testing area, the areas that have been geofence and then expand out from there. I'm not sure if this is actually standard. It seems from this article that it isn't. So I'd be curious if that turns out to be like a, you know, a better play. But anyway, good for Zoox. Kind of interesting. We'll see if they end up taking all.

Andrey

Next story, Microsoft lays off 1500 workers and apparently blames. Quote I wave a little bit medical here. They're laying off between 1000 and 1500 workers from its Azure cloud and mixed reality departments. And according to the story, Jason Zander, an executive vice president of some stuff at Microsoft, stated in an email that layoffs align with a company's long term vision and strategy for

AI. So, you know, maybe a little bit overstating how this is blamed on the AI wave this explicitly, but capturing a bit of a trend that's been going on for a while within tech more broadly, of a lot of layoffs since like two years ago, and increasing focus and moving of funds towards AI from other parts of these big giant companies. And this is just reaffirming that.

Jeremie

Yeah. It's true. It's also it's hard not to miss the parallel here with meta. Right. This is, in large part, a reorg that is swapping people on the, the metaverse side or the kind of, augmented reality side to the AI side. Right? So kind of kind of interesting that that trend, the, you know, one, one hype wave may be getting getting way to another, though arguably the AI wave isn't quite hype. One of the things that people often chalk this stuff up to too, is just systemic over hiring, right?

We just came out of a, a period of time where interest rates were super low. Companies were, you know, hiring like gangbusters. The stocks went through the roof. Now things may be cooling off just a little bit. That may be part of this. But, you know, it's 1500 highly technical workers maybe worth taking note of. So there was a very dry statement written by this guy, Jason Zander, who was EVP of strategic missions and technologies at Microsoft.

He sent an email out and was saying that, quote, or a clear focus of the company, is to define the AI wave and empower all our customers to succeed in these in on board systems. It's a long thing and it's all bureaucratic speak, no real information contained in there. 1500 people gone basically, you know, AI wave maybe. Maybe not.

Andrey

Do you want to skip these next two stories? I think.

Jeremie

A yes.

Andrey

Yeah, yeah, I think these are not super important. And you can.

Jeremie

I think the wrong one might be, but I think we can. I mean.

Andrey

It's a little ledge.

Jeremie

You know, but it's an Elon pledge. It's definitely.

Andrey

Going to. Exactly. And one last story for a section. The title is Avengers Assemble. Google, Intel, Microsoft, AMD, and more team up to develop an encoded interconnect standard to rival Nvidia's and the link. So there you go. They have formed of this ultra Accelerator link promoter group to develop a new interconnect standard for AI accelerator chips. And they want this to be an open standard that allows multiple companies to develop AI hardware using

the new connection. Similar apparently to Intel's Compute Express link. And seems like this article goes into a lot of technical detail with you. Some. This is expected to be based on something that AMD has proven infinity architecture. So yeah, you're saying, you know, a lot of companies want some alternatives to Nvidia as you would expect.

Jeremie

Yeah, that's definitely the desperate play here. We've seen this in almost every dimension of what Nvidia does. Some combination of like Google, AMD, Intel, meta, Microsoft. You know they're all trying to like in Broadcom. You know we'll we'll try to chip away at the different things that make the Nvidia ecosystem what it is. Look NVLink is a big part of that. You know you can think of it alongside the Cuda you

know Cuda ecosystem. You can think of it alongside, you know, the actual chips themselves. It's all a package that makes Nvidia this very sticky environment. So this is another big push. Like all things in hardware we are talking about the distant future. That's unavoidable right. So when you look at this interconnect play, yes, it's about setting up a new standard that would be open source. Right. By contrast to the closed source approach that Nvidia is taking with their in NVLink setups.

But ultimately this is going to take a while to hit the market. Apparently it's going to be in the next two years really before there is any kind of new interconnect tech that's actually baked into any products. You could use your Bios, so Nvidia still has some time. Take advantage of that. That head start. So not going to be hitting the market anytime soon.

Andrey

And on to the projects and open source section. And we begin with a new open source model coming from Chip, who I wish we were just talking about earlier. Model is GLM for nine to be. And so this is yeah pretty much like a ChatGPT ask model. So we have a GLM for nine B. We also have GLM for nine B dask chat which is aligned to. Human preference is to be better. And they also have a variant with, context off of 1 million tokens. So a few variants of this model over here.

They say that it supports 26 languages. There's also a modal variant of this. You know, they say pretty good in evaluations, as you might expect. In fact. Seemingly better than some models like mistrial of a pretty behind the frontier models, I think, you know, add it to a stack of models in that eight, seven, 9 billion range that you've been seeing.

Jeremie

Yeah. Open source, by the way, underrated as a, geopolitical battleground. And I'm sorry I'm talking so much geopolitics this episode. I have just been to DC. This is where my head's at. Yeah. But it is worth noting, like a lot of the models that we've seen from like 0.1 I right there, they're U-series models. I was immediately curious when I saw this thing. I was like, I wonder what the licensing terms

are. So I went to the license and lo and behold, the license is governed and construed in accordance with the laws of the People's Republic of China. Any dispute arising from or in connection with this license shall be submitted to Haidian District People's Court in Beijing. And interestingly, you're told you're not allowed to use the software to engage in any behavior that endangers national security and unity, endangers social, public interests and public order.

And as we just heard, the People's Republic of China will determine what that means. So if you're an entity planning on using this, you're effectively operating under the jurisdiction of the People's Republic of China and the Chinese Communist Party. So this is a really interesting way of waging a kind of tech warfare, in a sense, where you can tie the kind of licensing models to, yeah, to your sort of geopolitical objectives. That being said, I think this is a really interesting

model. It was trained with 10 trillion tokens. It is being. So there are a couple of different versions of it. You've got the base model which is the 9 billion parameter model that by the way they say this is the open source version of the latest generation of models they've got. So that might imply that they're, you know, additional close source versions to that aren't being released, but they have that base model, the 9 billion parameter model. There's a shack version of it.

So presumably this is a dialog and and perhaps instruction fine tune version. This one is interesting. It's called GLM for 9 billion chat. This was really interesting because it includes a bunch of features that have been trained into it specifically for web browsing, code execution, custom tool calls, a function calling that baked into the model really interesting and long text reasoning. It's got a context, this one over 128,000 tokens. So pretty impressive. And it compares favorably.

They claim to llama three eight to be in a whole bunch of things mathematical reasoning, coding, general knowledge and all that stuff. And I have heard that it apparently works better on low resource languages, rarer languages for which you have less training data than even llama 370 B so so that's a much, much bigger model than you have here, a model that seems to outperform that.

There is also a chat very, by the way with a 1 million token context window again like this is an open source model with a 1 million token context link that is about 2 million Chinese characters. For those of you keeping up at home. And I looked at the eval data on this one. That needle in a haystack test, that all interesting needle in a haystack test where you you played to fact somewhere in a giant prompt, you're going to feed the model. This is often used for these long context window

models. And you could test the model's ability to recover that fact. Right. So you mentioned like my favorite color is red somewhere in the middle of all of Shakespeare. And you're going to ask the model what is my favorite color and see if I can recall it? Well, it basically nail the needle in a haystack, test across the entire, token context. So no, no matter where in that input you place that, that nested fact, it's going to, it's going to pull it up with almost 100%, recall.

So that's pretty impressive. The last model is this GLM for v nine. That's a multi modal model. So it's got vision as well getting 8000 token context window. But this one was a little wild. It actually seems to outperform in a whole bunch of different different evals. Models that include cloud three opus GPT four turbo. At least the version that was back in April, and Gemini 1.0 Pro like this is really,

really impressive. We're going to wait to see obviously what the what the actual kind of open benchmarking looks like when people actually take this to Huggingface. But for for now, these results team really, really impressive, especially on like the long kind of long context window oriented evaluations like long bench at this thing seems to do really well even compared to absolutely top the line model like Gemini 1.5 Pro, like code three, but so really are open draws.

So really impressive result. But you know, as ever, if you're going to build on top of this, keep an eye on that license because that is of.

Andrey

It's a little bit more restrictive than some of our licenses we've covered. And next up, how you face an. Poland Robotics show off their first project, an open source robot that does chores. So just recently they we talked about how Huggingface has started working on open source robotics. In particular, they launched this little robot repository that is a software offering that kind of provides a unified API to a lot of data sets, a lot of. Things you need to do, remote control and so on.

Now there's a new initiative there called reishi two, a humanoid robot designed by in robotics. And it is an from an open source robot company that is based in France. And so I guess we have more French AI being shown off here. And we're switching to robot has apparently been trained to perform various household chores and interact safely with humans and pets. Is a kind of pretty cute looking robot.

Doesn't look super humanoid, not as advanced as something like you know what Tesla has or bigger has, so on. Now they say that the tele operated it to collect data and then. Had it be able to do some of these tasks on its own, like taking an object and moving around, so on. So yeah, I think very interesting. Huggingface is a huge and influential company. So exciting for me to see them continuing to push in this direction.

Jeremie

Yeah, apparently Richie two is going to be coming soon, so, you know, that's what the website says anyway. And supposedly going to be a big leap forward. That piece about the Teleoperation two is really interesting. Makes me think a little bit of, oh, what's that company? Jordy Rosa's company, their sanctuary.

Sanctuary I right. We've talked about them before, that approach that they're taking as well of using teleoperation like humans wearing VR headset to actually train these models to imitate AI through imitation, learning what they're doing. And it's an interesting question as to whether that's the thing that takes off or whether, you know, something more like a limb grounded or some combination ends up being the

way. But definitely an interesting, interesting, it's that the, the product that they have, their so-called flagship product, Ricci one does seem also pretty cheap looking at these prices. I mean, you know, about 10,000 bucks for the cheapest version with a whole bunch of augmentations that you can make and all that stuff. So kind of interesting bit of a bit of a DIY play, almost like a, you know, Raspberry Pi for humanoid

robotics. But we'll, we'll see if it if it takes and if it ends up being the being the way.

Andrey

Yeah, that is very cheap compared to a lot of hardware. Do you could pay that much just for a single arm much less. And a whole like torso, a couple arm there.

Jeremie

They, they sell an arm for like €10,000. So if you got if you got €10,000, maybe, maybe buy yourself another.

Andrey

And out of lightning round that. First up, we have another story on a data set. The story is that Sephora has defeated Zeta, a 1.3 trillion language modeling data set that it claims outperforms other ones like pile C4 and so on. So this is Zafira technology, and we've created this 1.3 trillion token dataset by combining a bunch of other open data sets like refined web star, golf star quarterback, Z4 pile,

a bunch of them. They apparently looked at all these data sets and then duplicated them, so made sure that kind of all the best parts of all of them are combined and you don't have low quality or. Kind of copies of data that polluted. And so yeah, they've released it now. And they say that when you train on it, compared to some of these other open things, your alarms learn better and do better. So this is a pretty big deal.

I mean, having open very, very large data sets is very important for people to be able to compete with. The likes of OpenAI and a tropical have their own internal data sets. Data is super important, almost just as important as compute. So interesting to see some companies continuing to push on that in the open source wars, as we've been, I guess, seeing a lot.

Jeremie

Yeah. That's true. And you know, when you look at the open source data set, so many of them start kind of the same way. You know you've got your your standard resources, you've got your common crawl kind of Wikipedia, Google News Corpuses and all this stuff. And so what they're pointing out here is the reason that deduplication was so important is that all of these open source data sets, they all kind of have the common crawl foundation or, you know, the Google News foundational, all

that stuff. So they were saying apparently in total, they actually ended up getting rid of about 40% of the initial data set, going from about 2 trillion to, well, the final count of 1.3 trillion in this deduplication effort. So just what it takes, you know, it's not as easy as just combining all those data sets together. You're going to get a lot of duplication and obviously a lot of work to clear those out. So there we go.

Andrey

And last story is stability I the boots I do. Why do others love this word. It's so hard to say. Stability has released a new stable audio open for sound design. So you just mentioned a bit earlier how stability. I launched Stable Audio and actually Stable Audio two was released just, a couple months ago. Well, now they have stable audio open 1.0. So this is the open source variant of it. So you can use it for your own purposes, fine tuning it.

Although apparently this is still available to users under the Stability Noncommercial Research Community Agreement license. So you still aren't able to use it for commercial activities. And unlike Stable Audio, Stable Audio open is meant for shorter clips meant for more sound effects, things like that up to 47 47 seconds long and apparently has been. Trained in a responsible way. So they trained on audio data from Free Sound and Free Sound Free Music archive.

So there's no copyrighted or proprietary software.

Jeremie

Yeah. And there, you know, in a bit of a shift from their original model, I think stability now ironically trying to find a stable business model. You know, they've struggled with that in the past. They're now obviously not getting the full open source treatment to this model.

So you know noncommercial only which is well anyway, I think a sign of things to come for for stability as they try to figure out how do we monetize, you know, how do we monetize given this high scaling race, given the insane costs of inference? You know, at a certain point you're going to need to find a way to not just give, well, I guess free cost of training, I would say, because this isn't about serving a budget like open source in the models.

So, yeah, I think, you know, keep an eye on how exactly stability sets up its licenses going forward, because I think it's going to tell us a lot about the future direction of the company.

Andrey

And onto the Research and Advancements section. And we begin with a pretty exciting piece of research from OpenAI, a, titled Scaling and Evaluating Sparse Autoencoders. This was paired with a blog post from OpenAI titled Extracting Concepts from Gpt2 four. And if I may give my take on it real quick, I just have a root set. I think it has a lot in common with something we discussed a couple weeks ago from on topic.

So in Fabric Releases blog post titled I think Mapping the Mind of a large language model or something like that, along with a very long paper. In that paper they trained an auto encoder on the outputs of a bunch of units in a large language model, and showed how when you train this secondary model that essentially compresses the outputs of a bunch of the neurons, you can find that some of those compressed representations correspond to very interpretable features.

And famously, there was a Golden Gate feature where some set of activations corresponded to the Golden Gate Bridge. And there was some fun stuff that are propagated through a revamped. Well, now we have this research from OpenAI, which really is to me seems very similar from a technical perspective. They have slightly different approaches. They use a sparse auto encoder, but the basic aim here

is the same. Take some outputs from the large language models, train a compression of it to be able to interpret to, as they say, extract concepts from GPT four, and they actually say that they use this approach to find 16 million features in TubeBuddy for things like apparently a features related to human imperfection a rhetorical question. Various things like that.

And in addition to the research, we also released some code for doing this sort of training and interactive feature visualizations similar to what anthropic also released. So interesting to see these lines of research coming out close together. I think very likely just both of the companies were working on it in parallel because this was not, let's say, entirely novel or there was a lot of work leading up to this. But as Whovian gave her personally, I think this is really exciting.

We are seeing some very nice interoperability and alignment possibilities out of this research. So yeah, very exciting.

Jeremie

Yeah, for sure. And actually, you're very right about the I mean, even down to the sparse auto encoders, in fact, an anthropic stunt, a lot of research on those specifically. So this is in a sense it's like OpenAI doing like taking a page out of the anthropic playbook to a certain degree, but doing very interesting scaling experiments with these autoencoders before we get into it. On a political note here.

So if you look at the author list, there are two people on the team who are no longer OpenAI, and that's Ilya Sutskever and Yann, like a Yann, like, formerly one of the heads of the super limited team, along with Ilya, he has gone over to anthropic. We recently learned. Right. So it really is now it's very much an anthropic sort of line of effort.

This seems to have been their last big piece of work, their last salvo before before they were taken out into the back alley and shot or just moved out. I don't know I to say.

Andrey

Left on good terms. Okay, let's not.

Jeremie

Collect on the accelerator. Yeah, yeah. So so we have here that of course there's a joke. We, we don't advocate for, for, for violence on the podcast because otherwise we, we wouldn't be able to, you know, be on YouTube. Okay. So a couple of things about this whole autoencoder setup. So the way it works fundamentally imagine that you have a big giant blob of neurons and activations of those neurons. Right. That's what goes on in deep learning.

You would love to find a way like there's probably a better way, but smaller, more efficient, more compact way of representing all of that. You could probably compress that. Think of it literally like a zip file. Find a way to compress that represented with fewer numbers. Right? Instead of all the neurons, all the activations of all those neurons, maybe we could come up with a shorter list of of numbers that captures the same meaning. And that's what autoencoders do, right?

So you're going to take this like this giant mess, this giant blob of numbers that is your neural network. And you're going to try to encode that in a lower dimensional vector. And the way you train your autoencoder is you're going to compress your neural network representation and then try to reconstruct them from that same small set of numbers. Right. So that you can call the dimensionality of the autoencoder in right.

In only a small subset of the entries in that autoencoder are going to be allowed to be non-zero. That's what makes it a sparse autoencoder. So you have a small list of numbers. In fact, even though that list of numbers is small, the vast majority of them are forced to be zero. So only an even smaller number of numbers in that list is actually allowed to be non-zero to contain information. And that is really important, because having a smaller set of numbers makes it a lot more interpretable.

You can build models on top of that representation to make predictions. For example, about hey, is there a concept that is being captured in my in my kind of latent, my shorter autoencoder vector, and that kind of allows you to do some interesting interpretability? There is a whole bunch of detail in here. I read the paper with great interest. I think it's worth looking at if you're technical.

The long the short of it is. They explore a lot of interesting ideas around the scaling properties of these autoencoders. Right. So how, for example, does the size that the end the dimensionality of the autoencoder. Have to change as you increase the size of the model that you're trying to model to, to kind of to compress, if you will. Right. So unsurprisingly, the larger the model is, it turns out the larger the autoencoder dimensionality has to be. If you want to achieve the same loss.

And then you can ask the separate question, how then should the number of non-zero numbers within the autoencoder scale? That's a separate scaling question. They go into that too. They derive a bunch of really interesting scaling laws in there. Anyway, we could go on forever on this, but maybe I'll just park this by saying the last piece, which is metrics are really hard in this space.

It's really hard to figure out what is the metric that measures how good your autoencoder actually was, because it is meant to capture meaning, right? It's meant to allow you to interpret what's going on in your network, but there's no number that you can easily turn to. The captures that meaning, like the number that you have, is the reconstruction loss to how much, let's say how much.

Of accuracy you lose when you reconstruct, when you rely on that compressed version of your, autoencoder to reconstruct the original full model. So how how faithful is that reconstruction to the original? That's one number that you can measure. That's what you actually use to train the autoencoder. You try to minimize that distance, but that's not really

what you care about. What you care about is things like, you know, does this autoencoder actually recover features that we think it should have. Right. Does it capture the idea of a car. Right. Or the idea of running. And they have the anyway for metrics that can that can kind of detect that or capture that. And then you know, how much how much performance do we sacrifice if we use only the parts of the model that we

can interpret? So if you just look at the interpretable bits of the autoencoder and only use those as information used to reconstruct the model, your how much how much performance sacrifice. These are all really interesting technical questions. They're all in the weeds. We could spend an hour on this paper, but unfortunately we can't highly recommend checking it out if you're if you're into the the whole interpretability space.

Andrey

Yeah. A lot in this paper. And just to reiterate, I guess I think these approaches are starting to challenge a little bit the notion that we have no idea what's going on inside these models. Right? This is a commonly stated thing. I think you said it in your Joe Rogan appearance even.

But with these sorts of visualizations or condensed interpolated features, we can at least detect some of what's going on and steer it, as we saw with anthropic thing, where if you find a feature that deals with drugs, you can literally, at inference time, set it to zero so that it gets much harder to jailbreak, to get them out, to do things, because you no longer just train the model to avoid seeing behavior behaviors, you can detect the behavior at runtime and avoid it. So yeah very exciting.

One more technical detail is very for kind of explaining those features they build on somewhat from last year, called you to graph, where you can basically associate a feature with, word or a couple words. So we don't have sort of, full explanation. We have a semi automated way of doing it. And they do provide some fun, interactive, visualizations as one of the links. So if you're curious, that could be fun to play with.

Features like apparently one about humans having flaws and police reports and identification documents, stuff like this. So yeah, if you're going to check out a paper, this one is is probably a good one.

Jeremie

Yeah. And I think, you know, to the point of the you rightly raised, though, you know, whether this challenges the notion that these things are interpretable. I don't think on Rogan that we ever would have said that with the, you know, we have no idea what's going on on any level in these models, right? But, you know, from the standpoint of the safety implications, unfortunately, this kind of technique isn't like it's so nascent.

And the scaling properties of it right now, you know, are not necessarily going to be able to catch up with the scaling of the models. But it's an interesting avenue, and I think it's something that we really want to, you know, you really want to see pursued. That's part of the reason why I'm so excited that John Mica finds himself now at anthropic, where he's going to be doing some of this research. But I think our next paper might be even more to the point of the point you've

been making. It's it's the the anyway, we'll get to it, but I think it's almost directly on the nose in terms of how can we better understand and actually control the behavior in a more robust way of, of these models.

Andrey

All right. Let's get to it. So the next paper is improving alignment and robustness with short circuiting. And yeah as you said it's very much related to what we just talked about where the idea of short circuiting is related to a lot of other approaches we've seen with steering, where essentially you directly interface with a model, you mess with its state at runtime, and so you've short circuited this response and its ability to do something harmful rather than having to train it to

not do something harmful. You directly kind of impose it on it then. Jeremy, I'm sure you're kind of excited about to sell it to dive into a deeper.

Jeremie

Yeah. I didn't, did it? Did it show? You're so. You're right. It's inspired right by these this notion of activation engineering I like that term. This paper use the term representation engineering. But anyway sort of same idea. What is that field. Well that's the field of taking a model. You know you give it a prompt. And then of course the neurons in the model are going to get activated just like the neurons in the human brain. Right. Some of them can spike and get excited.

You can figure out sometimes which neurons are associated with a given concept. So for example, the concept of deception, right? Feed the model a bunch of inputs that are associated with deception. See which neurons or groups of neurons consistently light up. And that can give you a hint as to which neurons are involved in that behavior or that concept. And then you can think about doing things like, I don't know, this, drawing out those neurons and trying to remove

the behavior. It's a very kind of cartoonish way of thinking about it. Then that sort of thing actually has been shown to work. This paper is actually not just about interventions at runtime. It's actually about a training process. So what we're going to do here is we're going to start by making two data set right. So the first is going to be a data set that's meant to activate the the neurons or the representations that are associated with some bad behavior.

Right. So think here of, you know, bio weapon designs, cyber attacks, deception, all that bad stuff. This is the evil data set, right? The evil data set that we're going to use to activate the evil neurons. Right. Very caricature ish. The second data set is going to be good data set. Right. This is a data set that's expected of prompts that are expected to lead to fine, upstanding, decent behavior that you actually want to retain in your model.

Right? So your regular sort of instruction response prompts, good dialog, whatever. Right. Next, what you're going to do is train the model, and you're going to train it in order to map. So every time it gets, inputs in the evil data set, you're going to train it so that the representations you end up with get mapped to random

values. Basically, you're going to completely fuck with the output of the model to train the model to, like, behave completely incoherently when it gets inputs from the evil data set and the good data set, you're going to try to train it to replicate the original behavior. Just sort of that behavior doesn't change in the model. They're specialized loss functions that they use for this.

So, you know, you can think if you're a bit technical, you can think about trying to, you know, minimize the, the L2 norm between like the, the random. For the eval data set, you want to minimize the data set the L2 norm between some random set of activation values. It waited by some parameter and you're, you're actual kind of trained deviations. But anyway, that's just technical detail. And this is actually a really interesting paper.

It works really well because what you're doing is you're not unlike fine tuning. You're not training the model to just try to refuse an instruction. Right. You're training the model to to just completely collapse when asked to display dangerous

capabilities. You're actually getting at the the actual, like profound latent capabilities of the model in a deeper way than just sort of like adding some fine tuning on top where you have a model that could help you design bioweapons, for example, but just tries to tell, you know, and then it's just a question of like when somebody finds the first jailbreak, what you're actually doing in this instance is you are training out in a very aggressive

way. The capacity, the latent capacity of the model to even do those thing. This leads to some really impressive results in like because it's such a robust technique, it allows you to take away bad capabilities from agents as well as multi-modal models and standard language models. They see a really dramatic decrease in attack success rates. Basically, the ability to use these things to do

bad things. Roughly speaking, using this technique relative to refusal trained models or models that have special safety prompts, it is a dramatic improvement. And one of the most important things about it is that you don't suffer from the same kind of catastrophic forgetting that you do with a lot of fine tuning strategies, right?

So with a lot of fine tuning strategies, I sort of training the model, you know, hey, now that you've been trained on the whole internet to know a bunch of facts, I'm now going to give you some extra training to refuse requests to make bioweapons and bad stuff, or that extra training can actually cause you to forget the stuff that you once knew. What this training process does, partly because it's also being trained to remember the skills that it does have through that good data set.

It preserves those capabilities while nuking the bad ones. So I thought, this is just a really interesting paper. It's another one from Dan Hendricks who is at the center for AI safety, sort of known for a lot of his work on the representation engineering side. Anyway, really interesting result. And and more than I think anything I've seen in the last couple of weeks, this is one of those papers that make me go, You know, it's your point, Andre.

Like, you know, this is really starting to give us not just diagnosis, but also treatment for a lot of these sort of extreme risks of misbehavior and weaponization. The challenge, as usual, is, you know, what happens when you actually get to the sort of superintelligent domain where all bets are off, but it's still very impressive, very important result.

Andrey

Yeah, that's very cool, I think. And it's yeah, to your point, I think it's kind of showing that this Asatru, you're making progress, you know, with large language models, foundation models. We were in a sort of nascent state where things like alignment to just say, like say no when you're told to describe out of hack something, right? Well, not too surprising. You could jailbreak it and get around it.

But these kinds of approaches that are, let's say, the next generation or the next step of basically just make it so a model isn't capable of doing these things, seems like kind of the next logical step to take. And I agree that this seems pretty exciting.

Jeremie

Yeah. That's also just as a last quick note, that's where we start also to move into the domain where it's like it's not just a technical problem anymore. We have, you know, potentially some some techniques that might be helpful for this. It's a policy problem to make sure all the labs that ought to use these things are using these thing. Right. So that's kind of where the anyway the whole policy story becomes inextricably linked with the technical one. But exciting result here.

And hopefully we'll see more like this.

Andrey

And art. Riding around that first story is automatic data curation for self-supervised learning, a clustering based approach. So they say that having just clustering of data on a lot of diverse data to obtain clusters. Can be used to curate data to sample it for self-supervised learning. And this is one you highlighted to Jeremy. So I'll let you go into more detail on that one.

Jeremie

Yeah, I know for sure. So one of the big challenges, anytime you want to make a data set to train your model is getting sort of, wide coverage of concept and ideally a balanced coverage of concept. So, you know, what you'll typically find when you collect a bunch of data is a big lopsided distribution where some topics are a lot more covered than others.

And, you know, that can that can be, for example, if you think about ImageNet, right, this classic data set with a thousand different, image categories, the like image of the category of plumber shows up some. I forgot what it is, but it looks like some ungodly fraction of the time, you know, so the image recognition system was trained on internet, a really, really freaking good at identifying plumbing, plungers. But, you know, that's that doesn't really map on to the

real world. And so when you want to do, for example, in the case of language modeling, you want to collect these big language modeling data sets. You want them to be large. You also want them to be balanced and kind of diverse in that sense. Right? So when you just do a big internet scrape, what you'll find is an awful lot of concepts show up very, very rarely. And then you have a couple of dominant concepts take up a large portion of the data set.

So this paper is about figuring out automated ways to balance out that data set. And they roughly speaking they use this clustering strategy. If you're in data science or ML you know you'll know this. It's K-means clustering. They're basically going to figure out a way to cluster automatically concepts, in a way that's balanced.

And this is actually kind of challenging because if you just use your standard K-means clustering strategies, what you'll find is you'll get a whole bunch of like, if you have a concept like, I know rain, right in the UK or in places where they get a lot of rain, people tend to have a lot of different words for rain. They tend to talk about rain a lot, and so you'll have a whole ton of different nuanced takes on

rain. And if you do a standard K-means clustering analysis, you'll end up with a kind of. Big clusters that are sort of resolved into many small clusters. And so you'll kind of overrepresented. You'll have a bunch of different cluster centroid all around this, this one topic. So clustering in the conventional sense doesn't really solve your problem. You end up anyway. Whatever. Replicating the same problem with your clusters.

They use a sort of hierarchical clustering approach, where they apply K-means to identify first level clusters, and then apply again until they get a uniform distribution. The math really is a little detailed here, but fundamentally, this is an attempt to deal in an unsupervised, automated way with this problem of rebalancing data sets to make sure that you get a more uniform distribution of concept in that dataset for for training purposes. That's the idea.

There is a lot of math in it. It's kind of hard to describe the geometry of the problem, but if you're interested in data set design, data set curation, development, this is a really good people to look at because it does seem like something you could use if you're in a, you know, a small startup. You know, you need to do things in a very efficient, automated way. Yeah. It looks it looks like a pretty interesting strategy to use.

Andrey

Yeah. And, automatic data curation is something that presumably anthropic open AI, all of them have an approach figured out, too, because at the scale you're operating at, you know, it's it's unthinkable amounts of information and data. And so you have a lot of garbage in there. And you got to find some ways to, as you said, the duplicate maybe balance the data. So this was just one example of the sorts of things that you might need to do.

And one more story before didn't ace the bar exam after all. MIT research suggests so. OpenAI has claimed that GPT before scored in the top 10% on the bar exam. But some new research suggests that that figure was skewered towards repeated test takers who had previously failed the exam. So that's where to be for us, placing in the top 10%, apparently, compared to just first time test takers, GPT four scored in the 69th percentile in the top 31%, and.

Yeah. So it's it's still good, but not quite as good.

Jeremie

Yeah. They paint a pretty nuanced picture of this. And I appreciate this because, you know, this was one of the examples that I would often use the uniform bar exam just because sometimes I'd get prestigious like lawyers and things like that and lawyers. Pay more attention when you talk about their own. Understandably, their own tests. A bit of nuance here, right, though. The challenge seems to be. With the test that opening I ran.

Was, against a population of test takers where you would sort of expect people who fail the test, and are taking it again, they tend to kind of clog up the system and take up a disproportionate fraction of the of the seats of the test taking seat. And so the argument here is going to be, well, you know, you're claiming that it achieved 90th percentile performance. But really it's kind of like it's beating out. Yeah, a good fraction of first time test takers.

But you've got all these people who kind of suck at the test and are taking it for the millionth time. They're never going to pass. And and they're, you know, making up the rounding at the bottom of the bell curve. Right. So the when they zero in on, on just first time test takers, they find it's like, you know, the 69th percentile or sorry 40th percentile

for them. And when you adjust more broadly for what they argue is a more calibrated population of test takers, you get the 69th percentile, which, you know, still like I'm old enough to remember when that was considered really impressive. Impressive, but but still an important adjustment to make here. They go on to argue that the specific areas in which the model performs poorly, are the ones that are maybe closest proxies for for what a practicing law actually has to do for the essay writing

section in particular. So the argument here is going to be that GPD for not only doesn't hit 90th percentile, but the ways in which it failed to hit the 90th percentile, or some of the most important for assessing the actual skills and capabilities of a practicing lawyer. So maybe GPT four is more like a, crappy lawyer, and we don't need to worry about it so much in that sense. One kind of slightly funny thing, there was an email that was sent to OpenAI about this.

And in response, there was an OpenAI spokesperson who just referred the publication here, Live Science to Appendix A on page 24 of the GPT four technical report. And the relevant line says, quote, the uniform bar exam was run by our collaborators at Case Text and Stanford Codex. So really classy way of just throwing the collaborators under the bus. Got to appreciate that. Well done OpenAI. It it ain't my fault, says OpenAI. And hey, maybe they're right.

Andrey

Yeah. I mean, to be fair, that's a that's an understandable error. It doesn't seem like maybe they intentionally didn't realize that reputation was skewed towards repeat test takers. Yeah, it's more of a, sort of clarification of then calling out OpenAI for overhyping their capabilities. I feel.

Jeremie

Yeah, it's you're right. It's a nuanced sort of thing that could easily be missed for sure.

Andrey

And on to policy and safety. And we begin with a story about a former OpenAI researcher who is claiming that we'll have AGI in 2027 and much, much more. So this has been making, a bit of a wave cross AI commentators and Twitter, you know, users. The former OpenAI researcher in question is Leopold Austin Brenner. He was a safety researcher and has been kind of around AI safety discourse for a while in general.

And the story covers this very long document that he just put out this past week, titled Situational Awareness The Decade Ahead. I think as a if you read it as a PDF, it winds up being something like 160 pages. And so one of the claims is that we are most likely going to start to see AGI within a few years. It also goes into how we'll likely have superintelligence not too long after that. It has a lot in it. Let's just say given it, it is 160 pages.

And so yeah, this has fostered a lot of discussion. It has a lot of sort of, suggestions or thoughts regarding super alignment regarding, geopolitics, things like that. I'm sure, Jeremy, you took the time to go a little bit deeper into reading it when I have. So yeah. What is your take or response to this?

Jeremie

Yeah. Well, I think, you know, one day I may have more more to say about this case in particular, sort of in the, the, in the back end there and what went on there. I think some important details that I can, I can surface publicly. And it's like part of the public story here. Yeah. He was fired for allegedly leaking information from OpenAI. There's a lot of let's say, there's a there's a lot you could say about the culture of OpenAI and why they chose to leak him his allegations.

Certainly. And it seems credible to me, based on what I know, his allegation is that, in fact, he was kind of mostly let go for raising, a red flag around OpenAI AIS security practices, which, you know, we we conducted an investigation of the Siri practices at Frontier Labs. I will say broadly, without naming any particular labs, they have deep and profound problems. They are not capable of withstanding exfiltration attacks from like nation state attackers.

The one of the important points that Leopold was making, or has made, is that you ought to expect at some point these labs to to draw the dedicated attention of nation state cyber attackers who are trying to steal model weights and other controlled, say, in cyclic algorithmic insights. I think he's completely right about this. I don't think you have to wait long. I think that the chances that, anyway, that important IP has been accelerated already are incredibly high.

There's a lot of evidence suggesting that, so, you know, this is simply true. The idea of AGI by 2027, 2028. This is consistent with, my understanding of OpenAI's own internal expectation about when we're going to be hitting that threshold. They may be wrong, right? OpenAI may easily be wrong. They've certainly done well so far, but it's kind of investing in the space and leading it. So, you know, their opinion probably ought to be taken fairly seriously.

And if this happens again, you know, you get into this question of like, where are we basing our or compute infrastructure? Should we really be building servers in, you know, the UAE that we're planning on using for these massive AGI training runs? That's that's one kind of part of this whole sort of orbit of thought. But then the other is, you know, what ought we do on the safety and security side for these

large right now? One of the key things is you really got to lock down the algorithmic insight that are being generated in these labs, because those are the things that make the, you know, $1 trillion training run of 2027, instead be a $100 billion training run. Right. These materially move the needle on different nation states abilities to pull this sort of thing off.

And to the extent that you had these guys just sort of like hanging out in coffee shops and at parties as they do, and talking about algorithmic insights that should be controlled as they do, that is very, very bad. So, you know, AI is is either going to be viewed as a national security issue or it's not. We're either going to get to the point where these are effectively WMD level threats. I think that's likely to happen much sooner than than most

people think. But, you know, whether that's next year or the year after or three years from now, it's still, you know, very soon. And I think the implication is you just got to get. Serious about the security situation in these lot. You know, the US national security imperative is here. It may not be widely recognized, but, you know, being being late to this really means, given how high scaling works, being like quite, quite risky. Dangerous. Sorry. Quite risky. Late to this.

So, you know, we want to get ahead of it a little bit.

Andrey

Yeah, exactly. I think some of these sections in this longer document deal exactly what you've been talking about. One of the titles here is lock down for lab security for AGI. Here's one for super alignment. There's a section titled The Free World Must Prevail. So very much reiterating, I guess, the point that we should be concerned about China. And yeah, we won't be going too far into this, but I do want to quote from the intro, just to get you a feel for what this document is like. So quote.

The AGI race has begun. We are building machines that can think and reason. By 2025 or 2026, these machines will outpace many college graduate graduates by the end of a decade. Will be smarter than you are AI. You will have superintelligence in the true sense of the word along the way, National security forces not seen in half a century will be unleashed, and before long the project will be on. If we're lucky, we'll be in an all out race with the CCP if we are lucky. An all out war.

So stuff like that. Pretty dramatic. Also in the intro, with this kind of claim or just another quote, before long the world will wake up. But right now there are perhaps a few hundred people, most of them in San Francisco, or the eye labs that have, quote, situational awareness and so on. And so, yeah, very opinionated, long document predicting the future.

But I will say, given this is representing someone working at OpenAI, someone working in safety, also be representative of what a lot of people kind of think and predict. So worth taking a look if you find that interesting.

Jeremie

Yeah. I got to say, as somebody who's like, you know, works in national security directly with, you know, anyway, folks in the space. This was one of the most cogent, sober minded assessments of, of the landscape that that I've read. Given the level of data, there are little bits you could quibble with for sure. But but in general, the picture that he paints, I think is is has a good chance of being accurate. At the very least, it's something we ought to be preparing

for. So yeah, you know, kudos to Leopold for putting this together. And I'll certainly be reading his next blog posts.

Andrey

And next up, related story. It is that OpenAI insiders warn of a reckless race for dominance. So a group of nine current and former OpenAI employees have raised the concern about the company's culture and recklessness and secrecy. There is actually an open letter titled A Right to warn about Advanced Artificial Intelligence. And in this letter, they have made a few concrete things.

For advanced account AI companies, they say that these companies should, not enforce any agreement that prohibits disparagement, that the companies will facilitate a verifiably anonymous process for employees to raise risk related concerns. The companies will support a culture of open criticism, and that the companies will not retaliate against employees who publicly share risk related, confidential information.

So pretty directly tied to, you know, the previous story and also the stories we've seen in recent weeks regarding people leaving and the information about, some of the practices at OpenAI in terms of NDAs and stuff like that.

Jeremie

Yeah, it's interesting to see OpenAI retreat into this very corporate defensive mode. You know, it's very different from the OpenAI. We used to see where Sam Altman would kind of come out in this unprepared way and make these very organic, authentic sounding statements. Yeah. The response here and one of the the guy who's sort of leading the charge on this, Daniel Koga tyo, sort of longstanding governance researcher at OpenAI, very vocal about the risks here.

He said, quote, OpenAI is really excited about building AGI and they are recklessly bracing to be the first there. I will say. A lot of the characterization of the space in general, again, without pointing to particular labs, is absolutely consistent with what we heard in our investigation, speaking to dozens of these whistleblowers and concerned researchers at the lab. You look at the response from OpenAI here and I mean, I don't know, you tell me how deeply they seem to be

engaging with this. The response is, quote, we're proud of our track record providing the most capable and safest AI systems, and believe in our scientific approach to addressing risk. We agree that rigorous debate is crucial given the significance of this technology, and will continue to engage with government, civil society and other communities around the world.

You know, that kind of highly, almost political to me, sounding line like, and we'll continue to do X, Y, and Z. It's like almost like, yeah, okay, screw this. We're going to Q2 and whatever we were doing before. I think it's noteworthy to one thing that's very much lost in a lot of these stories is the backstory that this is about three days before our report came out back

in March. But around that time, OpenAI, a OpenAI, came out and said, hey, we're setting up an internal whistleblower hotline so that folks can comment, you know, share their concerns internally. Now, the extent to which that was taken seriously by the people in the lab itself, who were in the best position to assess the seriousness with which that offer was being made, presumably. You can kind of gauge that by the fact that all these folks are leaving OpenAI.

They're not choosing to go through the whistleblower hotline. They're choosing to reach out to the New York Times, they're choosing to, you know, whistle blow in their own way. This kind of, you know, compounds a lot of the questions, unfortunately, that have been raised about OpenAI since, you know, to what extent is this web actually serious? Is the leadership actually serious about living up to their publicly stated messaging? You know, there's a lot of concern about this.

A lot of this is public, a lot of this year we've run into in the course of, of of investigations, again regarding audio, various labs. So it'll be interesting to see if there's actually any movement here. The initial sounds from OpenAI, I don't seem terribly encouraging. It seems very much like business as usual as far as I can tell. But one last note here, too.

That's kind of interesting. So they have that is the the folks, the 11 people who all signed this letter pushing for whistleblower protections. They've signed on a pro bono lawyer. Lawrence Lessig. So he's famous for, advising Francis the Francis Haugen, who's the former meta employee who famously did her whistleblowing about basically, you know, matters putting company profits ahead of safety, the effect that this stuff is having on users and I think

teens in particular and all that. So, you know, this guy has a long history and track record of of working in that space. Yeah, it's a really interesting development. It's all been a bumpy ride, man, for for OpenAI lately. And we'll see if there's actually oversight. Right. Because one of the challenges is, yeah, we actually do need whistleblower protections here for these folks.

You know, I, I remember talking to whistleblowers from labs that shall not be named, but, you know, a lot of these guys were really concerned that the lab would go after them if they shared, you know, whatever information they they might want to with us or with anybody else. And so, you know, this is a sort of thing where clearly the culture of the space needs to change. It needs to be much more open.

We need to have more robust conversations, especially when the, you know, the people who know the most about the safety and security situation are the most concerned.

Andrey

That's right. The article title says OpenAI insiders worth noting that, there are two signatories also from Google DeepMind and one person who was an anthropic also. Six of the signatories are anonymous. The ones that are currently OpenAI are all anonymous. There's also a couple anonymous, formerly OpenAI. So there you go. It's clear that even the signatories are a bit concerned about blowback from this. Yeah. Next up, testing and mitigating elections related risks.

So this is about a process called policy vulnerability testing that, is a process that involves a very in-depth qualitative testing that. Yeah, pretty much involves mitigating election related risks. And once again, I'm going to just hand over to Jeremy to go into detail on this one.

Jeremie

Yes, sir. So this is a post from anthropic. They're basically coming out and saying, look, you know, elections coming up. We've got all these models. We've got clod three, all the different cloud three models required to blah, blah, blah. We want to make sure they don't get messy. And so here is our strategy. So in a sense this is sort of a call to all the labs really to come out with their own election interference strategies. It's a three stage process.

Their pipeline, that is, that they lay out here. The first is exactly what you said. It's policy vulnerability testing. This involves bringing in external experts to start by figuring out at a high level, like, what are all the misuse applications that we can imagine wanting to test for. And then, like iterating on those tests with expert. This is a manual process that is, you know, very time consuming and intensive.

Then their second step is coming up with ways to scale and automate those evaluations. And that's important. Just because, you know, once you have the core ideas you want to test for, there's no possible way that you'll manually with human work. Get all the tests that you might want generated to test all the different ways that the system could be used offensively. So essentially it's a nice, consistent, scalable and comprehensive way to like audit the the

risk said. And then the last step is implementing the actual mitigation for these risks. They call out a bunch that they've discovered over the process of going through the cycle. The first is, you know, obviously changing the system prompt, right. There's kind of the meta prompt that quad uses to kind of decide what it should output, what it shouldn't, how it should behave. There's augmenting fine tuning data.

So, you know, maybe fine tuning out certain capabilities or behaviors, their usage policy. And then automated review tools to check user prompts as they come in and make sure that they're they're not trying to prompt the system for dangerous stuff anyway. And so on and so forth. Basically, this is anthropic just trying to be a little bit more open about what their strategy have been so far. It all, you know, kind of makes sense. It's a, you know, policy meets technical paper.

If you're interested in election interference and all that stuff and what the the emerging standard practices are, this is definitely a paper to check out.

Andrey

And one more paper related to safety. This one is teams of lime agents can exploit zero day vulnerabilities. So zero day vulnerabilities are hacks that are essentially unknown, that you know, when you release a product, the zero day means that. It's unknown to the company. As far as I understand, maybe there's a more nuanced way to say it, but we've talked before about how lambs were shown to be capable of exploiting such vulnerabilities when given a description of it.

And what this paper shows is that it's possible to get better performance on tackling unknown zero day vulnerabilities by this team approach of having different agents. So they have, I guess, some special specialties of different aspects of what you would do during hacking. So task specific expert agents. And then there's a planner and manager and a whole bunch of

details. But the gist of it is once you do this, there is significantly better capability to actually do successful hacking with, I guess, pass rates going up to around 60 ish percent compared to if you try and do this with just one limb without describing the vulnerability.

Jeremie

Yeah, yeah. No, it that's it. And I think this is really an important waypoint in the story of AI capabilities. Yeah I remember having a conversations with people like two months ago, about, you know, they'd say stuff like, well, you know, I'll worry about, you know, AI risks of various kinds when we're, when an AI system could actually do a cyber attack, but they really can't.

Right. And one of the interesting things about this is so, so six weeks ago, we discovered, as you said, that GPT four, a model, which by the way, had has been essentially trained. It's been about 18 months since it was first trained. It's been around for that long opening. I brought in the world's best red teamers to discover all the set of dangerous capabilities it might have. They spent months doing it.

They finally launch it. Since then, hundreds of millions of people have tried as hard as they can to elicit all these capabilities from the model. And everybody missed the fact that this stupid model, which has been augmented since. But this base model. Actually, did Latently have the capability? If you gave it a description of a cyber vulnerability that existed, to exploit that vulnerability. 87% success rates. That's what we talked about six weeks ago.

Economically, it is 2 to 3 more efficient. If you have an identified security vulnerability with a description that you feed it. It is 2 to 3 times more efficient. Roughly speaking, they get a human cyber attack or attack dude that attacks. So we've already hit economic escape velocity on cyber attacks with GPT four, and we hadn't realized it for like a freaking year and a half. And all that discourse about, oh, we'll worry about AI systems, you know, all these applications.

When we actually see something, we're missing the fact that the whole time this is actually been possible, well, today we now know it's possible for zero day. So when you have a description of the, of the vulnerability that you can feed to the system, essentially that's a known vulnerability. It may not have been patched everywhere, but it's known to the world. That's a one day vulnerability.

A zero day, as you said, is when there's no prior knowledge of the thing, you just have to, you know, basically plop the the system into the in this environment and just have a discover the vulnerability from scratch. Given the environment, this essentially just works. So you see a pass rate this pass at five eval. Basically you give the model five chances or sorry, the system I should say because it's a bunch of

agents working together. You give the system five opportunities to try to discover and execute on A00 day attack. And 53% of the time it will succeed, right. So that's actually quite, concerning level of success rate for a zero day vulnerability. That's actually kind of not that far from the the one day vulnerability, even when you give it a description of the vulnerability, this suggests that the model is actually really good at identifying those, those weak spots. And they do run.

And a cost analysis for this. What they find is with some fairly conservative assumptions, it is actually already slightly cheaper to get this model to do zero day vulnerability exploits than a human programmer. So again, kind of reaching economic escape velocity on these attacks with this model. And this is with GPT four, the version that OpenAI is serving up. Presumably it's the one that has all the

safeties. Right? So if you had access to the base model without all the safety fine tuning, presumably you'd be able to go even further. So this is it's quite a remarkable paper. You know, I think at this point the the debate is sort of over when it comes to our language models going to be able to execute. And I mean, catastrophic cyber attack that debate.

Yeah. Pretty hard to make the case that it's not going to happen with the next, you know, couple turns of the crank in the next couple of years, certainly, but plausibly

quite a bit sooner than that. You've already got systems here that are doing, you know, quite significant levels of lift off on zero day on one day vulnerability exploits even more economically efficient than human actors, you know, again, even if they're wrong, these estimates are, you know, or two too optimistic in a sense by a factor two a factor of five. We are well on our way. And there are, I think, a lot of debates as a society that we have to now revisit on the

basis of this evidence. There are some some goalposts that going to have to shift, some minds that are going to have to change because I, for one, absolutely remember many, many discussions with your very, very bright people who, you know, I understand a lot of this stuff better than me, but ultimately who felt quite strongly that this simply could not happen. We now live in that world. And and I think there's a policy reality that needs to catch up to all this stuff.

Andrey

That's right. And I think also worth noting that they evaluated this, system on the real world and known zero, they vulnerabilities. So they constructed out a set of 15 of them. And they did have different various severities ranging from kind of medium severity use to critical severity like source code, stir scale, live blog or admin, manage user stuff like that. So when they say that these things can talk, we'll see what vulnerabilities. That's not just like some like lab proxy

obviously a devil. And maybe like these are real vulnerabilities. And so yeah yet another example of like okay, you don't have to believe in AI doom or X risk to be concerned about AI and then really care about alignment. And on to our last story in the Synthetic Media and Art section. I just picked this one out because it was kind of interesting. The title of it is The Uncanny Rise of the World's First AI Beauty Pageant. So this is from the company Fan View, an AI infused creator platform.

I believe there might be more to it than that, but I guess you don't need to get into it. This company launched what it calls the world's first beauty pageant for AI creators, and they contest ten semifinalists from over 1500 applicants. So from what I understand, found you has a lot of AI generated imagery of beautiful women, let's just say. And so this is kind of what you see from that is the creators on the platform, have submitted some of their

creations. And yeah, there's some scoring based on their I guess, character, but also their social media clout, stuff like that. And. Anyways. Kind of weird, I guess we could say.

Jeremie

Yeah, this is it's making me think of I was watching this documentary about what's that thing? that that for married people there. That. Dating app for married people.

Andrey

I forget, you know what I'm saying? Yeah, but dating for married people. Yeah.

Jeremie

Yeah, yeah, there's a Netflix thing on it. I'm going only fan club, staring at the fan view and what that's like, they got that all me fans, one 1 million plus fans. Yeah. Like, Holy crap. These are, you know, the really big follower bases. Yeah. oh.

Ashley Madison, that was it. Yeah. And they were talking about how right the basically it turned into this like website for men to be scanned because it turns out that men tend to be the dominant user base when you when you set up a website like that. And they had a bunch of basically bots who are just, I guess the men would have to pay by the message or something, so the bots would just be cut, causing them to to burn through money and credit.

Yeah. This is the sort of thing that he could really imagine Ashley Madison using. It's pretty wild looking at his follower account to like Ben Morris. Fake eye celebrity 770,000 subscribers. Oh yeah, 450,000 on Spotify for Katy Perry. Nina, which is another fake one. I know, it's seriously weird. It looks like it's not all, by the way. Sexual stuff. It's because it is like, you know.

Andrey

Yeah, a lot of it. YouTubers. I will say a lot of it is sort of more like just an Instagram influencer type thing, or like someone who posts a lot of photos of themselves traveling or, I don't know, being fashionable, stuff like that. There's also a lot of this on that platform. But yeah, I guess the whole idea of a beauty pageant is sort of to the point of there's a whole movement towards creating I influencers, very much mirroring what human influencers do.

Pretty wild. Pretty. Pretty wild. And that we are done with this episode of last week. And I as always, I will mention once again that you can find our text newsletter at last week in that I you can find our emails if you want to give some feedback in the episode description and our Twitter handles. If you want to follow us and catch the stories when we retweet them, which we do sometimes. And lastly, we would appreciate if you review and share the podcast so we get more listeners.

But more than anything, we like to know that you listen and enjoy. So please keep doing that and enjoy the AI generated song that will come right after us.

Speaker 3

This is the wrap up. Given the chip alignment and the story's clear drama in the over the past year. Hey, I've been teased. The future's here. Tune in next week. We've got your fix. Breaking down Jekyll and the Mr.. Join us next time. Stay in the loop of AI chatter. Join the trend. From breakthroughs to the latest scoop in this day AI world, we all read you. Thanks for watching. See you soon with more articles.

Unidentified

Under the moon.

Transcript source: Provided by creator in RSS feed: download file