#169 - Google's Search Errors, OpenAI news & DRAMA, new leaderboards - podcast episode cover

#169 - Google's Search Errors, OpenAI news & DRAMA, new leaderboards

Jun 03, 20242 hr 6 minEp. 208
--:--
--:--
Listen in podcast apps:

Episode description

Our 168th episode with a summary and discussion of last week's big AI news!

Feel free to leave us feedback here: https://forms.gle/ngXvXZpNJxaAprDv6

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/

Email us your questions and feedback at [email protected] and/or [email protected]

Timestamps + Links:

Transcript

Andrey

Hello and welcome to a new episode of Last Week in AI, where we will summarize and discuss some of last week's most interesting AI news. And as always, I want to mention we also have a text newsletter last weekend that I where we have even more news for you to read about and one of your co-hosts Andrey Kurenkov. I graduated from Stanford, where I learned about AI supposedly last year, and I now work at a generative AI startup. And back again, we have a regular co-host. Welcome back Jeremie.

Jeremie

Yeah. Hey, man, it's good to be here. It's good to be back. I like how you allegedly tier your Stanford bio. Stanford being, you know, famous for. Yeah, like like bullshit AI research, as we all know. Yeah. It's it's good to be back, man. I just emerged from the the YouTube comment section, and, with a couple of bumps and bruises. But but, yeah. No, it's it's good to be back.

Andrey

Yeah. The comment section of a Joe Rogan podcast where you were a guest. And. Yeah, hopefully we got some converts and you'll like your podcast. We'll see. But, I'm sure, if nothing else, you've educated a lot of people about AI policy and safety on there.

Jeremie

Dude. The like the comments are amazing. Like Joe warned us before we went out, he's like, this is going to be like nothing I've ever experienced in the comment. You know, the comments especially, it was it was really funny. Like, I mean, obviously it's a very polarizing issue. You talk about AI, you talk about AI safety capabilities, AGI

anytime. But yeah, there are some there's like a whole section about just like people are like, I don't trust these guys just based on the spelling of their names, which I get. I get that right. We have very, very suspicious name spellings. Totally, totally valid. There's cool stuff there about like, yeah, we're like, what is it? We're spooks. We're spooks for the USG, we're spooks for some other government. We are secretly trying to, like, help the labs,

like, accelerate. It's actually a lot like what we saw with the the action plan release, but just more extreme. Or where I think we had, like, we found four different clusters of conspiracy theories, one of which was like we're trying to help OpenAI achieve regulatory capture. The other was we're trying to have the US government stop all the AI research, fill out the other ones.

We're trying to help China. And then the last one was we're actually like secretly, because our thing is is not strong enough. Like our recommendations aren't strong enough. We're trying to like unshackle. So because we're not saying stop AI research. So that must mean that we're saying like barrel head build, you know, anyway, it's a whole thing. So I love that, it's always a pleasure to be to be, the center of some kind of conspiracy theory or other. So, always fun, but glad to be back.

And and excited for for the show. We got a lot to cover.

Andrey

We got a lot to cover. And before that, as usual, I do want to just take a moment to call out some nice listener comments. We got a couple podcast reviews on Apple Podcasts. One. I'm sad that the podcast is detailed but approachable. We get right balance of getting into detail without getting into too much technical stuff, which we do try to do over sometimes. You do get very nerdy, and number one is, good overview of AI hype. And that was very,

nice. We did mention a bit of feedback where it would be nice if sometimes you talked about critics like Gary Marcus or people out there, and I think that's a that's a good feedback where we should, kind of call out some of

these things. And lastly, we also got some comments on YouTube and we got Mister Billy the Fisherman had some, good sort of information about how the EU has each country has, its own laws and can veto EU laws, and it's kind of similar to the US and that there are federal and state laws. So yeah, thank you for that comment. And FYI, I am now actually recording the video and editing video of the podcast.

So if you do like watching that stuff on YouTube or seeing our faces, you can go over there and like and subscribe and so on. And in the future I might add graphics, I might add a bit of fence yourself. For now it's just our faces talking instead of just the audio.

Jeremie

Yeah, that's why your hair looks so good. We've been we've been working on it for the last couple of hours.

Andrey

Oh yeah. All right. Well, that said, let's get into the news starting as usual with the tools and apps section. And we have one main story to start with, which is that Google's AI search errors have caused a further on line. So after a very big Google IO event where we covered a lot of news, it did roll out and new feature for search called AI overview, which essentially. In response, has a generative AI kind of summary of some sources at the very top of the search before you get

to any links. And as soon as it came out, the internet did its usual thing of testing it and trying to break it. And people very much found that it is a bit broken in many ways, and there are many examples of this. On the funnier side, I think it led to some recipes where you could put glue on a pizza, and it told you to eat rocks if you ask it. How many rocks should I eat per day? It said, well, I don't know, one, two, three or something like that.

And then on a more serious side, if you asked how many, Muslim presidents did the US have, it claimed that Barack Obama was Muslim, which was a big sort of, conspiracy theory back in the day. And, yeah, that's I think, a serious example of, you know, if if anyone's like, how many rock should I eat? Well, that's not gonna cause real misinformation. But if you're saying that Barack Obama was Muslim, that's pretty serious. And that's pretty problematic.

And Google's response to all this has been very defensive. It has basically said all of this happens for a minority of queries. It, isn't a big deal. They have kept it and apparently they have been manually removing bad results instead of like trying to fix this feature.

So a real bad blunder on that part again, as happened a few months ago, where the image generator of faces and I think showcases that Google is unlike some of our previous efforts with Waymo or things like that are moving super fast and not very carefully into adding all of these AI features.

Jeremie

Yeah, like one way to read this is as a response to pressure. You know, everybody's talked about AI right. Exerting pressure on Google famously I think Microsoft's so Satya Nadella CEO of Microsoft obviously with that close title, OpenAI said I want to make Google dense, right. Mission accomplished. All these launches, the Bard stuff, the, you know, image generation stuff like the Gemini launch and now this. It's, you know, it does read like unforced errors.

But of course, the challenge is when you want to move at this kind of velocity and ship faster, something's got to give. You're going to have either more false positive errors or more false negative errors. And you know, they're they're choosing to go in this direction, which may ultimately actually make sense. Faster shipping. That certainly is the philosophy at OpenAI.

We've seen gaffs at OpenAI. But the difference there is when you look at OpenAI, you tend to think of them as a younger company. You know, they're they've been around for a little bit, but they really are for the first time, shipping some real features or have been in the last few years. So people look at it a little bit more forgiving. You know, they're they look at what OpenAI is doing with a bit, you know, rose colored rose tinted glasses.

Also, the case that OpenAI has often been the first to launch a completely new category of feature. Right. So you have Google here launching something that we've kind of seen before. It's not the first time, right? Perplexity is experimenting with similar things as well. No doubt. That's putting pressure additional pressure on Google and on top of OpenAI. But they don't get credit for being first here.

So that's a big ding. So the expectation then is okay, if you're not going to be first, you're going to be best or at least going to do it without gaffs. I think that kind of raises the bar a little bit on the expectations, especially because Google's a bigger company. So maybe a, you know, bigger downside risk for them, but they have no choice. I think they really have to ship faster.

They've got to be able to collect data on failures like this and iterate their processes, which they certainly will be doing. One thing I thought was really funny at the bottom of this New York Times article, when they talk about, you know, all this litany of kind of failures, embarrassing things that are put out by the AI overview, they write, a correction was made to the article.

They say an earlier version of this article referred incorrectly to a Google result from the company's new AI tool, AI overview A social media commenter claimed that a result of for a search on depression suggested jumping off a Golden Gate Bridge as a remedy. That result was faked. So, ironically, The New York Times, in reporting on the fact that Google screwed up, screwed up their facts, which I thought was kind of funny. But anyway.

Andrey

Sort of. Yeah. And that's, that's a good example of how this kind of became a meme on its own, where it was like a meme template. It was really funny. Some of these things, like when you Google cheese not sticking to pizza, it says you can also add about one a cup of nontoxic glue to the sauce to give it more tackiness. And I don't know, maybe that's true. Maybe there is glue for food.

Jeremie

Andre, I was just on vacation in Italy. I can tell you this is absolutely something that they do there. So. Yeah, look into it. That's that's all I'm gonna say.

Andrey

That's right. But at the same time, I think you're absolutely right that this is in response, like the narrative last year was Google is behind. Google has sort of lost its mojo. It's used to be an AI leader. Now they're really lagging. And to be fair to them, I think they sort of fixed. That aspect of it. We have coming out of Bard with Gemini now

very real player. In these frontier model categories, Gemini usually ranks similar to cloud and ChatGPT in general, and they did got get some things right, like Gemini for instance, came out and was generally favorably received after Bard was generally seen as kind of lame for a while. So it makes sense that I made these mistakes. Certainly it would have been more ideal that they moved more carefully and avoided these kind of blunders.

But, as you say, I think on balance this is better than continuing to be behind. And especially from an advisor perspective, you can see how probably investors are still kind of happy that Google is catching up, or at least moving fast, rather than being super careful and, avoiding releasing until they're super ready. And that's the only major story for this section, so the rest is the lightning round. First we have telegram gets an in-app Copilot bot.

So this is from Microsoft, and they have added their Copilot bot to yet another, platform this time telegram. And this, like similar things like WhatsApp, allows users to search, ask questions, and converse with the AI chat bot. It's currently in beta and free for telegram users in both mobile and desktop platforms, and I think this is a real indicator of a trend that we have seen with how AI is integrating into these products, where it's essentially added

everywhere you have meta. We saw this add added to Instagram search, to I think WhatsApp as well, Facebook. I think the idea a lot of these companies seem to have is that I should be everywhere in every app, etc., where you can talk to a chat bot in particular, whatever you want. And we'll see if that's really the case that people want that, or whether they want to just one dedicating app for talking to ChatGPT or something like that. But, certainly that seems to be a play currently.

Jeremie

Yeah, it definitely seems like a worthwhile experiment. It's also interesting when you look at all these different platforms, the the instances of, you know, ChatGPT or Gemini or whatever the model is that's running in the back end are

all going to be different. Of course, like with your in telegram or WhatsApp or, you know, I could imagine at some point it starts to make a little bit of sense to maybe try to integrate those have one kind of consistent, persistent, chat bot experience so that you're not kind of refreshing the context in each case. Then again, in some some telegram channel, some WhatsApp channels, you want the fresh context just for that. But it kind of makes me wonder what the future of integration looks

like. You know, do you go horizontal or does it stay vertical in this way? Obviously big security implications to going horizontal too. But yeah, I mean, we'll see. It's also being rolled out in beta right now. So just sort of in testing mode. But it's free on mobile and desktop for, for telegram user. I'm not sure if you, you might have mentioned that already, but, that that was sort of interesting as a broad release, Microsoft getting more, more distribution. That's what they do really well.

Right. Getting these, these chat bots in people's pockets.

Andrey

And next up, another kind of similar story. And that's that opera. The browser is adding Google's Gemini to its browser, and opera has already had this Aria assistant that I believe already had some chat bots. So this partnership now has the chat bot from Google, and apparently it's part of where they're heading. So you can kind of choose this chat bot over other ones.

And there is also a feature where Aria is allowed to add new experimental features as part of its AI feature drop program, which apparently they have. So yeah, it's another example of, you know, AI everywhere,

chat bots everywhere. And personally, to talk a little bit more about it, I think this is probably the right approach in some ways, where I have found personally that I've gone to ChatGPT so much in so many cases, that it makes sense to me to basically have a chat bot accessible from anywhere as, of these, you know, large language model, chat bot where it's built in, you can ask for it in whatever your preferred platform is.

But I think the initial responses from a lot of people on Instagram and WhatsApp have been like, why is this here? So again, we'll see if most people actually want it.

Jeremie

Yeah. And I think you're right. It also has a lot of strategic

implications. Right. Because now to the extent that people want this stuff baked into the apps they're already using, rather than going to ChatGPT or, you know, open AI, directly opening AI, it's not that they get cut out of the process, but there's an extra layer between them and the user, which means that data gets shared with other people, which means, you know, there's leverage that third parties like Microsoft are going to have over the

distribution and access to these systems, over and above the fact that they themselves can make their own versions of the systems, too, which, as you said, is what we're seeing, with Google and what we're seeing here, actually, so I think that's kind of an

interesting strategic challenge, right? The maintaining that monopoly on the flow of data, especially in a context where, you know, we are we're not quite data bottlenecked yet, but we may be headed in that direction in the coming years, for various reasons. So that's a really important strategic consideration. But yeah, the apparently this is also going to come unlike the one we just talked about, unlike the telegram.

Copilot one, the opera one developer version is actually come with an image generation feature which is powered by imagine two, which is like Google's, image generation model. So kind of a, more of a multi-modal experience there going for here, which it'll be interesting to see. Like, I could see that being a thing, you know, when you're when you're chatting with somebody and, you know, you just want to kind of spontaneously generate an image in that context. Might be a nice, user experience.

Andrey

Yeah. I mean, I think a bigger deal would be once these models are better at generating memes, I think that will really be a delight to the internet. But I guess this is also cool. And next, Amazon plans to give Alexa an AI overhaul and a monthly subscription price. So as we've kind of been waiting for, apparently they plan to overhaul Alexa with generative artificial intelligence and as a result, introduce a monthly subscription price for that fee. This is expected to launch later

this year. And, vis a subscription costs will be on top of Amazon Prime. So, you know, it's going to be something you have to really opt into. So yeah, I think probably the right move. Probably Apple is doing the same thing with Siri. Certainly Google has already done it with their AI assistant. So something that, makes a lot of sense and I'm sure will make Alexa better.

Jeremie

Yeah. Amazon obviously playing a lot of catch up when it comes to generative AI. And certainly a giant particularly have an internal AGI team. We talked about this, I think a month or two months ago, that it was sort of a surprise to me when I learned this, to be honest, but it's obviously a newly spun up team, so it kind of makes some sense. Yeah. They're trying to make up for lost time, and this seems like a good way. I mean, they already have this brand in

Alexa. They have distribution in Alexa. I think one of the things that Microsoft is showing is the distribution really, really matters when it comes to generative AI. So, you know, maybe this helps them close the gap. They are actually surprisingly going to be challenged on hardware. Amazon will be, even though they have dominated for so long on the hardware game, being really the first, the first, like servers as a service company, you know, infrastructure to service,

all that stuff. They kind of like pioneered a lot of, a lot of the early work there. But now, is there more challenge of getting allocation from Nvidia that's starting to turn around? But, definitely a sense that Amazon needs to make up for lost time here. And this is one way that they could make some real progress there. The fact that it's not bundled into the $139 per year Prime offering. Right by default. Like you said, it's kind of interesting.

Right. And it does track, if you look at the pricing for ChatGPT or all of these other services, you're looking at like 20 bucks a month. That's, you know, a lot of that is going to be profit, obviously. But, you know, that's it's more then the, annual price of of prime in this context. So, you know, maybe, maybe that's just like at the at this point, inference is just too expensive to tack it right on.

And I think we'll learn a lot about the unit economics of this sort of thing as we start to see the bundling with services like Prime, which may eventually happen, but we just we don't know yet, and Amazon doesn't know yet how that's going to shake out.

Andrey

Yeah. And it's interesting to me that there are examples of both approaches. For instance, ChatGPT is kind of a hybrid approach where you have a free version of a less powerful model and you can pay for the best model. And that's generally true across over chat about providers. Twitter or ECS has had a different approach where you do pay for a monthly subscription, and as part of that you get access

to grok. So a lot of people experimenting with different approaches, and we'll see which one wins out next. Going back to a browser we got Microsoft Edge will translate and dub YouTube videos as you're watching them. So that's pretty much the story. We feature as support for translation of Spanish to English and English into German, Hindi, Italian, Russian and Spanish. And this will be also available for videos on new sites like Reuters, CNBC and

Bloomberg. And I think this is one of the less discussed big deals of AI is that the language barrier kind of across everything, will some extent go away. And the big deal will especially be real time translation. For instance, video chats or even in person. It will be very interesting to see if people still need to even learn languages when they, you know, want to move countries. Or we are moving into the sci fi future where we have universal translators.

Jeremie

Yeah, and I think you're right to latch onto that. And with the one of the key ingredients really enabling that is the

low latency, right. All the progress we've made on inference, one of the biggest challenges has been, sure, you can train these very large models, to do translation and do it really well, but if they can't translate in real time, if they can't translate with, you know, a couple hundred milliseconds of latency, let's say, then it starts to feel weird because I say something and like, Andre has to wait and then like, listen to it, you know?

And then we're having this, like, kind of stuttered conversation. And anyway, so I think that's one of the big things that's really made this possible. There's so many things. I think people are still undervaluing the importance of rapid inference and what it's going to do to, among other things, agents, but also translation and other downstream applications.

So, we're definitely one to watch as the unit economics for if rate for inference or the economics, I should say for, for inference models, get more and more favorable.

Andrey

And illustrate for this section. Ego thinks its gen AI earbuds can succeed where Humaine and Rabbit stumbled. So this is another example of the emerging sector of sort of wearable AI tech. And as per the headline here, we are focusing on Bluetooth earbuds. This company has been around before and already I think released some products, but they have this new EO one product that will be launching this winter and will integrate presumably the latest chat bots.

And apparently they actually was formed inside Google and incubated in their X moonshot factory, but are now operating independently. This product video and will cost $600 for the y fi model and 700 for the cellular version, but will not require a monthly subscription fee. So yeah, another try at this, and it seems like at least I think they will have a decent shot relative to other ones. It's not a new company.

They've already released products, and the form factor of earbuds has already had this idea of integrating assistants with Siri and so on. So probably the best bet for this kind of working. But again, we'll have to see if there's going to be a major thing that you know on YouTube will be viewed as your worst, product that people have ever seen.

Jeremie

Yet we're seeing a lot of those these days, it seems. But yeah. No, I mean, and this has an interesting form factor, for sure. You know, they are headphones, but they're they're pretty big because of this pretty big battery. Apparently has up to 16 hours of charge. With a that's with a phone in Bluetooth mode. So that's quite something. But one of the big selling points they're claiming here is, look, this thing has intrinsic value no matter what.

It can just function as a good pair of headphones. You've got that core kind of that cornerstone nugget of value. And then we build on top of that. Whereas, you know, you compare it to like the rabbit R1. And I'm just going to harp on this for a second, because while I was on my break, last couple of weeks, I did watch a video on YouTube by this channel Coffee Zilla.

You know, they go into the like rabbit R1 and apparently rabbits, founder in the kind of history of alleged fraud that he's engaged in, which goes back to like a alleged crypto scam that he ran, a couple of years before. So there's, like, all kinds of shady stuff going on in this space. I think when we first covered this, we were sort of like, you know, curious about this. I think they called it the, language action model, if I'm getting that right. It was a lamb.

Right. And we were like, what is this like? Because there's not a lot of information about it. It wasn't clear how the architecture worked, and we had a lot of open questions, but we're excited about it because, you know, surely nobody would throw like $30 million at a nothing burger. But hey, it's 2024, man. There's a lot of this stuff going on. So yeah, you're here. We have a different play. You know, EO coming out with this thing that has intrinsic value at the very least.

In that sense, you know, maybe a bit of a bit of a safer bet that they can build on top of. We'll just have to see how it how it pans out. But the price point seems to make some amount of sense for right now. And, hopefully the AI features are to follow.

Andrey

Yeah, that's a good point. I mean, people already spend hundreds of dollars on the earbuds and yeah, as you said, they're actually going to have value even if I isn't the best. So this is something I could see myself investing in, unlike the pen and other stuff. And out do the applications and business section. And the first story is p C agrees to deal to become OpenAI's first reseller and largest enterprise

user. So opening AI has courted various companies with this enterprise offering idea and C, which I think is a large sort of, actually, do you know what they do? I kind of forget.

Jeremie

Yeah, yeah. So C is one of the so-called big four consulting company. So when you think about, I guess there's like what Deloitte C I think KPMG is one of them. And I forget that it stands for PricewaterhouseCoopers. It's the old name of the thing, but it's consulting. It's a, you know, whatever problem you got, they'll throw some bodies at thing.

Andrey

Right. That's what I thought. But I didn't want to get it wrong. So this deal, will mean that it's 100,000 employees will now have enterprise licenses, and, apparently they will be a reseller as well to provide it to, I guess, presumably, clients of the company. So I really like, you know, this seems like, maybe very businessy. Whatever. Story, but I think this is a big deal. Like enterprise is where the money is at typically.

And this deal, we don't know how much the dollar amount was, but I'm guessing pretty large given that it's going to 100,000 employees now. So another kind of boon for OpenAI. Again, it seems in recent months we've been really pushing for this with Sam Altman, meeting with people and offering this. So interesting to see if they continue to make progress in the enterprise from.

Jeremie

Yeah. And this is a new business model for OpenAI as well. They're kind of I was gonna say dipping their toes into the water. This apparently amounts to like 100,000 of these, licenses, the enterprise licenses to public employees. So that's, you know, that's something. But it is the first time OpenAI has, like, done this resale model. So, you know, again, like you said, we don't know what the margins are. We don't know what the costs are going to be like.

But certainly if you're OpenAI right now, you got to be looking around you. You got to be looking at Microsoft saying, you know what? These guys, if they don't own us, they're definitely, as there was a, an article I think out in, maybe Forbes or Fortune recently talking about how OpenAI and Microsoft are increasingly becoming increasingly becoming frenemies. Just because, you know, OpenAI needs distribution, they need

hardware. They need a lot of these things that only Microsoft so far has been able to offer at scale. So they're now looking for other ways to meet those needs. So they're more independent. Microsoft is internalizing a lot of their AI model development efforts, which used to be what they'd lean on OpenAI for. They also still have access to all of OpenAI's IP, up to the point where they build whatever the board decides. AGI but, still. So there's a lot of this kind of dynamic going on.

You can kind of see this through that lens of OpenAI saying, you know what? We need our own way of achieving distribution, right? We talked about that in the context of the whole telegram, Microsoft stuffing chat bots into telegram and other things that got great distribution. It's what they do. So-so. Well, it's the reason that Microsoft Teams beat out slack, right? There's a long history of Microsoft using its outrageous scale to just beat out products, in some cases, superior products.

And so OpenAI, I hear that, you know, they gotta find a way to turn a profit. They've got to find a way to make up for those insane costs of both model training and inference and partnerships like this, you know, looks like a pretty good headline as well, right? If you think about that, their job is to sell the hours of their teams, labor of their consultants, labor to companies.

And if they have this brand is like this, you know, forward AI, forward, consulting company, the first of the big four to sign any kind of deal like this with open AI. All of a sudden, if you have any problem, you're like, oh, well, which one do I choose? Maybe you go with me for that reason. So, a really big marketing win here for me and potentially good distribution win for OpenAI. That's kind of at least my my top line thoughts looking at how this is shaping up.

Andrey

Yeah, this also makes a lot of sense to me. I think, you know, programmers are the number one category of people who have really adopted this into their workflows already. But I could see consultants being the number two sort of category where as these how I understand it. Consultants write a lot of emails and presentations and documents.

Jeremie

Oh, they'd never use a chat bot for that stuff.

Andrey

Yeah, maybe. But certainly I think there's going to be a lot of benefits said make sense that you see went away and, yeah, it's an interesting point regarding Microsoft because Microsoft has kind of built their business partially around providing software for businesses, and they have all of these ideas for, you know, if you want to buy wear OS, you can get the business version. Copilot, I believe also already has a business version, but copilot is powered by JB four.

So, Microsoft hasn't managed to build a JB four scale model. They have five, which is a smaller one. We do know they're working on a GPU for a scale model, so yeah, frenemies for now. We'll have to see where it goes. And next, another open air story. It seems like a lot of these, business stories usually start with open AI. So this one is again adding to a trend that we've seen and have been talking about for quite a while.

The story is at Vox Media and there's non-tech sign content deal with OpenAI. So OpenAI, has kind of done the thing we've already started doing with various things. You've covered how this happened with Financial Times, I think a couple other ones.

So here again, we're going to license retraining data from these, companies, their news stories, etc.. And Vox Media is planning to use open AI technology to enhance its approach to enhance affiliate commercial products and expand its ad data platform. Apparently The Atlantic is developing this microsite, called Atlantic Labs, to experiment with AI tools to serve its journalism and its, readers.

And one last deal is, at in terms of the, dollar amounts, it doesn't seem like we know exactly the numbers we know of News Corp's. It was 250 million financial times, or 5 to 10 million. So I'm guessing this was in a million, kind of category. But yeah, once again, this is happening. OpenAI, we know, has done this multiple times before. So part of a race to get copyrighted data to use for, let's say, legitimate training instead.

Of having these companies use them, like when The York Times had done with OpenAI.

Jeremie

Yeah. To your point, like they're saying, these deals also appear to provide OpenAI with protection against copyright lawsuits. Right? At all important. All important provision. And it's, you know, a long series of these things that we've seen, News Corp, which owns the Wall Street Journal and York Post, Daily Telegraph, Axel Springer, that's Business Insider, Politico, dot Dash Meredith, which I'd never heard of. They owned people magazine, their homes and gardens, Investopedia, a

bunch of other. So, you know, you got tons of these places. Associated Press, Financial Times, you know, when you start to think about the set of corporate media that now have deals with OpenAI, right? You start to think as well about like, okay, what does that do to affect coverage? This is a really interesting question.

We're now starting to move into that space where potentially an increasing fraction of the revenues that these outlets are generating is going to have to come through OpenAI, whether through referrals, through ChatGPT, as it cites those outlets in its responses. This is something we've seen with some of these deals.

Or just like the, you know, the way that the outlets are potentially portrayed in chat GPT outputs, they're like a million different ways in which influence could be, bought and sold, let's say, in that context. So this is a really interesting question.

As we migrate from a world where social media platforms are the ones that had all this outsized leverage, really, on, publications where I can imagine Facebook turning to, like, I know The New York Times and sort of having through their algorithm a huge impact on what succeeds and what fails. Now OpenAI starts to become that platform or Microsoft or whoever has these deals. And, you know, the more you see these things accumulate, the more you have to ask yourself, what does the media

environment look like? What is the likelihood, what is our guarantee of getting unbiased coverage? You know, especially in a context where we've seen OpenAI is, you know, allegedly, at least fairly keen to, shut down criticism in various ways. So, yeah, we'll we'll just have to see what happens here. But I think this is a really important story that may not actually be told in the way that it ought to going forward, precisely because of the economics behind the story. So one to track for sure.

Andrey

Yeah. It's, I guess not going to be surprising if all these news companies are going to start portraying AI in a more positive light. And it's interesting to note, OpenAI is kind of trying to have its cake and eat it, too, in the sense that they are have been making this free use argument. We can train on your data if it's copyrighted, but also making

all of these deals. So Reddit and all these news companies to license their data, which also makes sense in a, sense that, you know, before, I think no one really was aware that a, this will happen, that their data would be scraped by all of these companies for training. Right? It just wasn't a consideration.

Jeremie

Or they were and they just didn't care because there wasn't enough money to be made. Right? Like now now the chat bots work.

Andrey

Yeah. And now all of these platforms are building in protections such as these companies can to do it. And it's true that, you know, these companies need up to date information. They need search. They need to continue gathering data, pushing the frontier of until when these models have information. So I think regardless of the legalities of this, really, these companies have no options. But to do this, to access the updated information, get additional training data over the coming years.

Yeah. So very interesting and very kind of important trend to be aware of from an economic and business perspective, because this is the world that's shaping up to be and the kind of relationships that are happening that didn't exist in years prior.

Jeremie

And I do want to flag, by the way, when we talk about all these outlets in the incentives, like, it's not that I'm not saying I have seen like The Atlantic and News Corp, you know, outlets and stuff like that, behave a certain way. Like I think that, you know, we're going to see potentially that evolve. I think this is more, a flag to plant. This may or may not come to pass, but the incentives may be taking shape in that direction. Just want to be clear.

I'm not accusing right now the Wall Street Journal or the New York Post, specifically of of doing anything bad in this direction.

Andrey

Yeah, that's definitely true. And it is well worth also flagging that, you know, these are business decisions. It seems like journalists, the actual employees may have more mixed feelings about all of this. And we've certainly seen a lot of criticism levied at AI by such publications. So, yeah. And yet another example of all of this happening. And on to the Lightning Round, just one more story about OpenAI.

I guess we're on a streak here, and this one is about how OpenAI is launching programs to make ChatGPT cheaper for schools and nonprofits. So that's pretty much the. Sorry. We're going to offer it to these categories.

And to me, I think it's interesting kind of as part of a trend where AI and in particular chat bots is shaping up to be a landscape where it's a bit of a commodity in a sense, where now you have multiple options, almost like cloud computing, where a few companies that are big are able to train frontier models that are almost equivalent in terms of performance.

So it's feels to me like these, moves are in part to sort of lock in usage to get people to continue thinking of ChatGPT as sort of a default, as you do with browsers or cloud providers. And because, you know, in a way like Uber and Lyft is another example where these are basically equivalent products. And to keep, leading, you know, you probably can't compete on price too much and you can't compete on quality too much.

So what you're going to compete on is a lot on a brand side, and what people are just used to using.

Jeremie

Yeah, yeah. Well, this particular PCI is that nonprofit push, apparently. You know, open AI for nonprofits is allowing nonprofit works to use ChatGPT team at this discounted rate, 20 bucks per month per user. So that's, you know, attempt to sort of, I guess, yeah, build that brand, as you say. And then they've got tier a tiered discount system for larger, organizations.

But and then in this post, they share a bunch of stories about, you know, specific cases where people have been able to use their tools for, as they put it, a tangible social impact. They highlight folks who are using it to access international funding. A lot of this around automating the process of putting in, applications, grant applications for, for different grants that would help these companies, these nonprofits, thrive. So, you know, there's certainly very interesting use cases.

I will say, you know, it's interesting that there has been so much buzz about open AI, so many news stories coming out, that just happened to have come after, I don't know, a week where it seems like there's been a lot of fairly bad news about OpenAI, some news that some might say cast doubt on, perhaps allegedly, the integrity of the management of Sam Altman in particular. There's been a whole bunch of whistleblower stuff has come out.

And now we're getting blasted by the Who introducing open AI for non-profits. So I think this is pretty interesting. You know, if you're putting your media hat on, your marketing hat on your your, let's say brand damage control hat on, this might be the sort of thing that you might expect to happen after a week of hell, like the one that we've just seen. You know, not to say these aren't wonderful things. Hey, OpenAI AI for nonprofits. Kudos.

There's a lot of great stuff OpenAI has done on the security side flagging some, influence operations, information operations that they've just caught people trying to pull off using their tools. That's all good stuff, credit where credit is due. But, you know, you gotta you gotta notice the timing. Here it is. It is noteworthy at the very least. So, anyway, it kind of makes me wonder about the timing of this particular announcement, too.

Andrey

Yeah. You gotta wonder, I agree. Yeah. And this next one is another topic we like to talk about, which is, computing and in particular the frontier of computing that will enable AI to continue scaling. And this one is about China and our topic we like to cover. So there's a new Huawei patent that reveals three nanometer class product process technology plans. And once again, this is in spite of US section sanctions that really do limit the ability to build this sort of technology.

So I think, Jeremy, you're usually the expert on this, so I'm gonna let you take over.

Jeremie

Yeah. No. Of course. So, yeah. Big picture. You know, we've talked about this a fair bit. This idea of, like, the the different nodes, right, in the semiconductor fabrication process. So, you know, the three nanometer node is, is currently the sort of leading node. It's the sort of most advanced process we have for developing really, really, really, really tiny semiconductors. Currently three nanometers is basically all used for the iPhone.

It's what you tend to see. So as, the semiconductor fabs like TSMC and Taiwan learn how to make the next generation of node these even smaller feature sizes, that that generation node tends to be used for smartphones because you need the smallest, most compact processing. So what's happening now is the node above that node. So the next, the next best node, the four and five nanometer nodes are being used for AI processors like the Nvidia H100, the B100,

those sorts of things. So historically, what's happened is the US has been trying to prevent China from accessing the three nanometer, the five nanometer processes. China made a breakthrough, the seven nanometer process, which is the scale that's used to make the Nvidia A100, the GPU that was used to train GPT four. So they can basically in-house, roughly speaking, make GPT four level models or, you know, the kinds of processors

that go into that. So one of the key things that has happened is they've been cut off from accessing extreme ultraviolet lithography tools. These are the tools that you really need to make those five nanometer, those three nanometer nodes, China can no longer access those. That's a US export control policy that kind of kicked in fairly, fairly recently. So now, you know, they can't access these, extreme UV lithography tools. What do they do?

Well, instead of using extreme ultraviolet light to etching these tiny features, what you can do is use a kind of weaker or less effective tool called deep ultraviolet, a sort of a longer wavelength of light, less powerful, less effective for this, but you can pass over your chip many times to achieve functionally the same level of resolution. That's called multi patterning. So use the same kind of crappy old deep ultraviolet lithography

tools. But just kind of pass over your your wafer multiple times, pass over your chip multiple times to achieve a higher resolution. That. Because you're passing the same thing through the same system multiple times. It's slower, right? It takes longer to complete a production run, which means that the cost per chip is higher and your yields can be pretty bad too. So the economics of this process tend to be worse. What we're learning about here is Huawei and Smic.

Have they had this patent out for not double patterning but quadruple patterning lithography methods. So taking the same chip from four times which originally people thought, oh, they're going to use this to get to the five nanometer process, basically to get to the h100 level node. Apparently, though, they have plans to use this technique get all the way down to three nanometers, which would be really interesting.

This is like, you know, strategically, something that could position China to compete much more long term before they kind of are unable to keep up with the extreme UV lithography stuff that we can benefit from in the West. Always a question of like, okay, sure, but what are the yields going to be? And chances are they're going to be really crappy. But the state apparatus in China is willing to stuff

tons of money. They're willing to lose money on this big time, and make it a strategic priority to keep up on AI. So, you know, a lot of stuff going on here. But the bottom line is Huawei and Smic seem to have a plan, at least whether or not it'll work is another question. Do they have a plan to get to a three nanometer node size with this multi patterning technique that involves, well, quadruple patterning, taking the same chip, the same wafer through literally four times in the fab process?

Andrey

Wow. Yeah. Interesting. Another example of me learning of things that I had no idea about just while recording this podcast. Next, Nvidia, powered by the AI boom, has reported soaring revenue and profits. Soviet chip sales have actually exceeded expectation, leading to yet another rise in its stock and market capitalization to be over 2 trillion.

So the revenue for the three months ending in April was 26 billion, exceeding the fair February estimate of 24 billion and tripling sales from the same period last year. And, yeah, this is something we've seen so much with Nvidia. They keep rising and keep sort of leading the charge in terms of certainly valuation. And, we are at a point where their, sort of ratio of price to earnings is like 260.

Usually with tech it's like 20 maybe, you know, we've seeing Google, meta, etc. they make a lot more revenue. But Nvidia is now in a class of, ultra expensive tech stocks. So certainly people believe that India will continue to be dominant.

Jeremie

Yeah. And this is also on the back of you know, Nvidia has done a lot of things structurally to radically accelerate its shipping velocity, which I think, you know, is is a positive for them overall. You know, they've moved from shipping a new generation of GPU for AI every two years to every one year. Right? So we go from the H100 B100 to the X100 and so on. The roadmaps are also getting shorter. They're looking to iterate faster.

This is all kind of pulling into, you know, their presumably their valuation and how people are looking at them and yeah, evaluating them as an opportunity. It's interesting. I mean, there's a lot of competition shaping up. We've talked about it a lot on the podcast. You know, you've got AMD, you've got all kinds of other other companies that have more specialized chips and things like that. But but certainly, you know, their as they point out there, they

have the best distribution. They also have, as we've talked about in the podcast, the best allocation. Right. One of the things Nvidia is really, really good at doing is buying out crazy amounts of allocation from all the factories that fabricate the semiconductor chips. So Nvidia Nvidia designs the GPU and they ship their designs to like TSMC, for example, for actual fabrication and packaging. They buy up all the capacity they can.

So there's no room for competitors. And that's been one of the things that's really differentiated Nvidia from other companies. They're able to use their market dominance to preserve their market dominance in that way and buy out all the packaging, all the fab capacity they can. It's, it's aggressive, but they're willing to lose money by having things just sitting in inventory and get depreciated just because, you know, just because Jensen is super aggressive and it's got to be said effective.

Andrey

Oh, yeah. And I believe it's still the case that the market is supply constrained. You know, people are still competing to get their hands on the leading edge compute. So this is not example of race to a bottom. In fact Nvidia has as far as I understand fantastic margins on these products. So yeah, certainly if you're an investor you're very happy right now.

Jeremie

Yeah. We're saying their margins are starting to take a little bit of a hit as they're changing their pricing. In response to what they're seeing from the competitive landscape. But this is consistent with Nvidia, you know, looking at the landscape and saying, look, we're going to make it up in volume. We do want to start out our competitors. So, you know, that's a I think a long term bullish move.

And it's also what you expect, you know, as a company matures those margins, they're going to drop in a competitive space like this.

Andrey

And one last story Elon Musk's X raises six 6 billion in latest funding round. And now where valuation is 18 billion meaning that they are one of the highest valued companies in the space. We covered this was already rumored a while ago. And I believe at the time, Elon Musk, refuted the rumor, saying we are not raising money. Well, seems that they are, and certainly. And this, if nothing else, makes them competitive.

They have the money to train frontier models, and they are in league with, OpenAI, meta, and Google. As far as companies able to build these models that, you know, grok, as far as I understand, is still quite a bit behind.

Jeremie

Yeah. And, you know, they, Elon posted on X that he is setting up. He's setting a pre money valuation of 18 billion. So they've raised an additional 6 billion. So post-money they're sitting at 24 billion. compares similarly to companies like you know think about anthropic here. Right. Like those are maybe that's maybe the orbit a little bit more than, than maybe Cahir, if I recall.

So, you know, these sort of like mid-sized companies, we've talked a lot about whether it's possible for companies like this to actually survive in the kind of market dynamics that currently exist, especially around hardware. Right. This is $6 billion is a lot of money. But when we're looking at a world where compute training runs will cost will cost in the billions of dollars in the coming years. This doesn't necessarily get you that far, and you're going to have to show some ROI pretty soon.

It's interesting. You know, you look at the set of investors Elon is raising here from Prince Alwaleed bin Talal, and a bunch of. Yeah. So he's already kind of reaching for sources of money that are not your not necessarily just your A-lister Silicon Valley. Sort of VCs. There are certainly some impressive ones here. Right. We got Andreessen Horowitz, we got Sequoia. But when you're already raising from from Saudi entrepreneurs, it's, you know, you're at that stage, you're raising 6 billion.

You have to cast a wide net. It's an interesting question as to whether this is sustainable, strategically dicey situation opening. I had to align itself to a hyperscaler like Microsoft to get access to the computing resources they needed. Anthropic has done sort of similar things with Google and Amazon. Obviously, Google DeepMind is completely in-house now. So, you know, there's this interesting question as to whether XYZ position is stable long term.

I'm there definitely the largest market cap version of this problem that I've heard of that I believe exists. So I'm especially interested from a kind of market dynamics standpoint as to whether their position turned out to be tenable in the next few years. That being said, Elon thinks AGI by 2027 is highly possible. So if that's the case, yeah, maybe this is enough for now. Who knows.

Andrey

And onto the projects and open source section. And we start with another favorite topic. We go to more and more, which is evaluation. And the story is at scale. AI has published its first large language model leaderboards, which, as with other leaderboards, ranks AI model performance, in this case in specific domains. So this is called the seal leaderboards. And they evaluate on several categories like coding and instruction following.

This is similar in category to things like VMA Commons benchmark, Stanford's Transparency Index, and other things. We've covered the alum since but they do say, scale AI that there are some flaws in these things. For instance, ML Commons has public benchmarks so companies can train their models specifically to do

better on them. And you've covered how leaderboards are gamble, and probably companies are gaming them because this is now important for PR and you always want to say we have the best model. Generally, the results may show are pretty consistent with the picture. We know that GPT four oh is still leading Lamar 370. Ruby is close behind that. Mistral large is similar, cloud is similar. Jemma 1.5 Pro is similar.

So yeah, nothing too new, but I think a new entrant in the space that a lot of people are excited about.

Jeremie

There's a I think there's a lot to like about what scale is doing here. First of all, very, very welcome, as you said, to have these private leaderboards, right? They are harder to game. They're getting human beings to actually run these evaluations. So we're not looking at automated evals, which, you know, has been a big trend that we've seen, especially using things like GPT four to evaluate the performance of language models, which has a whole bunch

of interesting failure modes. Right, because GPT four doesn't always evaluate or rate things the same way that a human would, you know, it can succumb to adversarial attacks, it can go out and distribution do all kinds of things. So this is, I think, a really expensive effort because you got to get all these, contractors and outside, outside folks who they say they're bringing in experts to evaluate these things.

But, but it's worth it because it at least gives us a really kind of clear picture of how these models actually stack up against each other. And that's that's been a real challenge historically, you know, knowing which models are best in for what, you know, nice to see that that progress being made. They do say, you know, so they let you dive into different categories of evaluations and leaderboards that they have. They have some for coding, some for language, some for math.

The coding one, they've got like, for example, a thousand prompts, they say spanning a diverse array of programing languages and tasks. And anyway, so the whole idea here is in each case they're using, Elo scale rankings to kind of have two different models compete against each other, and then they have the winner, you know, get ranked by it, get scored by a human, or determined by human. So those those Elo rankings, we've seen them pop up on all the

leaderboards. It's a very common popular way to do this. It really is the best, most robust way that we've seen evaluations get done just because it's a relative ranking. It doesn't you know, you can't saturate that metric in the same way that you can saturate others like ML, you or whatever. So kind of interesting. And again, human evaluators, very expensive, but very important. And they've got a whole process that they lay out anyway for coming

up with these things. I thought the one thing to highlight from the actual data they share is, you know, Claude does seem to do best. Claude three opus does seem to do best. In math, though GPT four tends to win across the board in some cases

just by a hair. But I thought it was interesting because it does objectively give us that indication that, you know, Claude three opus, when it comes to your mathematical reasoning, is indeed, it seems objectively, the kind of superior model at least, you know, given the the noise that they've got, in their evaluations.

Andrey

Yeah. And, another reason to be excited about this is, just building on some previous efforts of scale where they published a research paper showing that on one of the popular benchmarks, in fact, people were gaming it. If you created a new private variant of a benchmark that wasn't already out there, then your models did not

generalize. Some generalize more of an others, but things like Phi, for instance, and some of these smaller models that claimed to be very good at smaller scales, they didn't quite, perform as well as the benchmark said so makes sense for them to continue pushing on restaurant next cohere for AI launches AI 23 with eight and 35 billion parameter versions and the open weights being

released. This is, building on the previous AI, which, brought together 3000 collaborators from around the world to build, the largest, in particular, multilingual instruction, fine tuning data set and, stable, hard to massively multilingual model. And what that means is that, for instance, I, 1 to 1 covers 101 languages and is focused on breadth, whereas AI at 23 covers 23 languages, but is more focused on

depth. So they are focused on really making this a very powerful, multilingual, large language model. And this is pretty important because, they say that this expands with state of the art language model to nearly half of the world's population, and this is now available for people to experiment, explore and build on for research and safety auditing. So yeah, exciting, new model release. And we've seen a lot of smaller models being released in the 8

billion model category. But this is a little bit different with a focus on multilingual models. You know, we still know that models like GPT don't necessarily perform as well on foreign languages. So pretty exciting.

Jeremie

Yeah. This is clearly a big differentiator that cohere is trying to go for. They have this I think it's a foundation. I'm not sure if it's a nonprofit tied to cohere, that they call cohere for AI and, and it's it is focused, as you say, on. This sort of like multilingual strategy where we'll, you know, translate. Had the AI performed really well in all of the languages we can. This latest one. So we talked about IO 101 I think when it first came out.

Is this, like, impressive, attempt to cover a wide range? Is that over 100 languages? This one, they're using a pre-trained model that's really high, highly performing, and they're coupling it to their IO data set collection, which apparently is a new data set that they've released, to go in depth to, to achieve that really high performance on the 23, the 23, languages that they've selected here.

It is apparently a really big multilingual collection has 513 million instances of prompts and completions for a whole bunch of different tasks. So we're looking here, presumably at instruction, fine tuning data, maybe dialog data as well. But that that certainly seems to be a lot of like fine tuning examples that they're feeding it. And then they do share win rates too. So it, it compares favorably to at least the models they're comparing it to here.

There's in Mr.. Al 70 instruct and and showing the win rate of IO 23 at least the 8 billion version winning about two thirds of the time, for example. And it's pretty representative across the board. So, yeah, an impressive new model from Cahir. Definitely another entry in that catalog of but of multilingual models. You know, meta has done a lot of stuff in this direction to the field of open source models. We said this a lot, but it it is getting pretty crowded, right?

I mean, we are getting to the point where, I think, I don't know, controversial hot take might be most of the time when companies drop a new open source model, it's literally just for the headline. I think that's kind of, were the world that we're getting to, especially when you look at these kind of late following models that, not necessarily this one, but the late following models, you know, that don't do a new Sota for their, their class.

So, this certainly seems to be important, at least for one category of the story. But the categories that we're making advances in are getting narrower and narrower. Narrower. Right. We're having a look here at like very specific problem of low resource languages and things like that. So anyway, there's there's cohere as differentiator for you and solid, new development.

Andrey

Yeah. And they compare to particular other open source models in the same class of number of parameters. So they compared to Mistral seven be in stock to GEMA 1.17. We don't compare to Lima. Interestingly, although maybe I'm missing this is just in the graphs we share. But as you might expect on visualizations, they have the top numbers. Although GEMA seems to be pretty decent and mixed draw 8X7B is quite good on some other benchmarks. So, yeah. And now we're open source model.

And that one I think researchers will like. On to the lightning round to the first story is who will make AlphaFold free? Open source scientists race to crack AI model. So a little bit less of a new story and more of kind of a summary. DeepMind, unlike previous AlphaFold, hasn't released the computer code for this. They promised to release the code by the end of the year. But researchers, worldwide, are now working on their own, their own open source versions.

Apparently there are 600 researchers signing an open letter to nature, to ask for this being open. So, yeah, a lot of people are trying to work on this. And we've seen before that open source versions of AlphaFold have been developed. So I wouldn't be surprised if, we scientists do crack this problem.

Jeremie

Yeah. There's all this question, too, despite the commitment from, Google to open up a version of AlphaFold three, there are questions about. Okay, well, but what will that version specifically allow you to do?

One of the things that, one of the researchers, they're citing here, this is a guy who's actually trying to replicate the model in open source, a kind of more open source version of it is he's he's claiming basically, look, I don't know that they're actually going to give us the ability to predict the structure of proteins, in conjunction with any, ligands or, say, drug molecules, roughly speaking.

So, you know, there are specific use cases that they're concerned they won't have access to, even when Google actually does do this, you know, quote unquote open sourcing of the model. And so you've got this. Yeah. It's interesting, I think, that the site, at least three examples of people of independent efforts, try to replicate this model. That did happen, of course, with AlphaFold two. But, you know, it is kind of noteworthy.

The most advanced one, at least as far as I could tell, seemed to be held up by this guy, Phil Wang, who's in San Francisco. He's got this crowdsourced effort, that, you know, they they claim to be able to replicate the code or have code that functionally replicates AlphaFold three, within a month.

But then, you know, separate question about the actually training the model using that code, which interestingly, they said, you know, estimated compute costs of around $1 million, which, you know, struck me as being surprisingly accessible. I guess I wouldn't have expected it to be that cheap. So if that's the case, if that actually is true, and I think there

are a lot of asterisks there. But if that is true, you know, then you could see, pretty plausible, an impressive results coming out in open source on the side of things within, you know, maybe a year or so or less, which, you know, a lot of implications for drug discovery, a lot of implications for Google's competitive moat, because Isomorphic Labs and Google DeepMind have been partnering on these things. That is their moat.

But also just, you know, you think about designer, you know, designer bioweapons and things like this. You know, as you start, it's not any one particular discovery that will unlock big risks, necessarily. But certainly when you look at this, you know, this is a big step. You're talking about models that can actually model the interaction between drugs and biomolecules. That's, you know, that gets you a lot closer to things that could

be, pretty dangerous. So you think about open sourcing those things. There are all kinds of questions that, naturally get raised.

Andrey

And the story of this section, Mistral releases Code Austral, its first generative AI model for code. So they trained this on over 80 programing languages. And as with other models of the sort, it can complete functions, write, test, fill in partial code, and answer questions about a code base in English. As with other mistrial efforts, they say this is open, but the license does prohibit for use of it, for

commercial activities. And as far as size, this is a 22 billion parameter model, so not easy to use. You do need a lot of compute for this sort of stuff.

Jeremie

Yeah, we've covered in the past that people get all, you know, fussy about, you know, oh well you're open sourcing it, but it's a 22 billion parameter model. So who can actually use that. But the big companies which you know fair enough. But like credit where credit is due, this is an open source model. So you know it's out there. other interesting stat 32,000 token context window, which they advertise as being longer than competitors.

Let's put a pin in that phrase, longer than competitors, because I think that there's a really interesting little subplot here. In in terms of the comparisons they're choosing. But yeah, so 32,000, token context is a lot when it comes to coding. That can be especially important just because, you know, this is a model that has had to read 32,000 token chunks of code. So, you know, maybe the ability to plane over sort of larger bodies of code and write more coherent code,

that accomplishes more complex things. So. That could be exciting. Okay. I want to take a second to talk about evals here. So, you know, they do a great job of highlighting evaluations that make this model look really good. Human eval, you know, knocks it out of the park community. Val, by the way, is this. It's just a benchmark where people create a bunch of unit tests, for fairly simple programing problems. And they basically get the model to try to pass the unit test,

right. One really interesting evaluation, though. Crux eval or crux eval. Oh, here. This is a benchmark that was designed by meta. And it's for input output predictions of functions. So given an input can you predict the output of the function or can the model predict the output. And then given the output and the function, can you predict what input went into it. Right. So it's kind of an interesting test of the model's ability to understand the logic of functions.

In my opinion, they're cherry picking their comparisons on this metric in particular pretty hard. So Meta's original paper, when you look at the crux eval, a benchmark, the first time they came out with it, they actually test code Lama 34 billion, the 34 billion parameter version of code Lama. And they find that it gets a 50% score on this metric. But Mr. doesn't show code Lama here.

They only compare the model to Lama 370 billion parameters, which is not fine tuned for code which miss trials model absolutely is. So this is a very unfair comparison a priori as far as I can tell. You know, you you have a specialist model and you're stacking it up against a generalist model. Yeah. It's a yep. Llama llama 370 B it's a, it's a bigger model and all that. But you get a lot of value out of that fine tuning process.

Interestingly so Lama 370 B then gets 26% on this benchmark and code Strahl gets 51%. So the interesting thing is they're trying to say, look, code star gets 51%. Look at how much better we are than Lama three. Again, a generalist, not a specialist model. When you actually do more of an apples to apples thing, what you find is code struggle gets 51% here. That is barely ahead of Code Lama, which is based on Lama two, just fine tuned for code.

So the apples to apples here seems to suggest, at least to me, and I'm maybe getting this bit wrong, but I don't think by that much. It seems to me that the code stroll is actually maybe more of, like past generation, more like a llama two level model. Which raises interesting, interesting questions and challenges here about their positioning. Again, Mr.. One of these companies that does not have a partner with a large, body of compute infrastructure.

They're well-funded, but not well enough funded that you would expect them to compete favorably with, like OpenAI or even Anthropic or Google DeepMind. So, you know, this kind of makes me wonder. It's clearly, at least to me, kind of nitpicked or not. Net picked a, cherry picked set of comparisons. That's how it comes off to me at least. That may raise some questions about, about how good these models end up being. But let's wait to see leaderboards.

And, you know, who knows? I maybe prove to be a complete, a complete wing nut on this.

Andrey

But yeah, I agree, it's a lot of good points. And it kind of shows how Mistral is a company. So as with other companies, they do want to maybe game or, you know, make this partially a PR move. One of the advantages is they are sort of a national champion, so to speak. France is pretty proud of it and backing them, which is not the case with various U.S. companies. So they do have some advantages. But, overall. Right. They're fighting fighting a hard fight.

And onto the research and advancement section. And the first paper is pretty nerdy. So we're going to try to make it accessible. We'll see if we can do it is titled The Road Less Scheduled. And you got to love the fun titles that I papers get. And so it's dealing with optimization and how you train your model. So there's a lot of optimizers typically use a variant of Adam which is an optimizer, you know, popular throughout deep learning for a while.

And as with other, kind of a parts of the learning process, these optimizers have parameters that you have to tune, kind of magic numbers that there isn't necessarily a great science for. Usually you do, ideally a hyper parameter suite to see what the best number is for this. There's no kind of known answer. So this is introducing a new. Way to optimize. That takes away the need for learning rate schedules.

So in particular some of these have optimization stop stopping steps T and they introduce this schedule free approach that does not introduce any additional hyper parameters over standard optimizers with so-called momentum. There's a bunch of math, a bunch, details we can't get into. And I believe when the results were first shown on Twitter a couple months ago, you know, a generate quite a bit of buzz, quite a bit of excitement. And now the paper has come out.

And, if you look at the various graphs, they show they don't quite go as big as large language models. They compare, things like ImageNet training image models and, they optimize on quite a few different benchmarks, 28 problems ranging from logistic regression to large scale deep learning problems. And we schedule three methods show strong performance matching or outperforming heavily tuned costs cosine schedules.

So again, exciting if you know your stuff when you engineer and train these models, because it's really annoying to try and guess what the hyperparameters should be. And yet more progress on some of these fundamental problems we need to address.

Jeremie

Yeah. And I'll, I'll give it a just a quick attempted like explaining this whole learning rate thing for listeners if you're interested, because I think, you know, it is important. But basically every time you feed your model, during the training process, you feed your model a new batch of data to have it update its its weights, its parameters, values. What ends up happening is you have to decide, okay, well, how much am I going to update my model based on the data that I just

fed? Right. Am I going to assume, like, oh, man, I got a completely change all of the weights, all the parameters in my model in response to the mistakes that I made on this training batch? Or am I going to go, well, I'm just going to take a very cautious little step here, right? So this is where the idea of the learning rate comes in. That is the learning rate. Big learning rate means taking big steps, making big changes to your model in response to the data that comes in during training.

Small learning rate means taking very kind of tenuous, careful changes, making, careful changes to your model during that process. And what people have found historically is that a steady, consistent learning rate throughout the training process is not

actually optimal. It's often best, for example, it can be best to do things like have a faster learning rate, take bigger steps early on in training as the model is just learning from scratch and has to be kind of reinvented, let's say with the data as it comes in, and then gradually reduce the learning rate as the

model hones in. And then there are a whole bunch of strategies, including momentum based strategies, which essentially, well, we're not going to get in the details, but basically there's strategies to kind of like tune the learning rate in a more and more intelligent way, using heuristics that aren't as simple as, just like, we'll start with this and we'll wrap it, that this is an attempt to further optimize, that process by getting rid of these learning rate schedules. How did I do?

Andrey

That's a yeah. Great summary. Well, one thing we should mention is, of course, the optimal learning rate is ten e, negative four. That's the magic Andre capacity number. So yeah.

Jeremie

Pretty here. First set it to to ten.

Andrey

Oh this this is you know you if you're in the know you know that's the perfect answer. All right. Next up a little bit different. We got a story. Training compute for frontier AI models grows by 4 to 5 x per year. So this is, some analysis from epoch AI. And they essentially. Yeah. Graph the estimated compute. We don't really know the compute used for a lot of these AI models, but they take quite a few different

notable models. Things like AlphaGo master, AlphaGo zero, Gpt3 palm tree before Gemini Ultra going all the way back to AlexNet and 2012. And once you graph it, they say that it's yeah, about about 4 to 5 x per year with various examples of Google is five x per year, OpenAI meta is seven x per year, which kind of makes sense to me. And this is pretty relevant because people have been predicting, you know, for instance, that we will get to

trillion parameter models. In fact, we already there, at least in terms of a mixture of experts. So pretty interesting to see at. Trend that has been at least somewhat consistent. With the frontier models, it's not quite as big. So we saw in the early deep learning boom, we had, growth of almost seven times per year. Now with these frontier models, it's more like four times per year.

And that tracks with, really a while ago we talked about a paper that talked about with three eras of deep learning that did a similar thing to this and graph them and showed that you can sort of divide the growth into these stages that happened. So, certainly interesting. And no doubt we'll keep seeing this grow perhaps at four times per year. We'll see if it's possible.

Jeremie

Yeah. This research coming to us from epic AI, which does amazing research in the realm of forecasting compute trends, we use them, to inform some of our, our work that we did, the week we, you know, got covered in the press. And, yeah, they've got great researchers on this. They did have a report that assessed, yeah, that we were hitting 4 to 5 x, per year scaling of, of compute for the largest models about two years ago. This is, in a sense, is a confirmatory

report. It's saying, hey, yeah, that trend it's still holding. They also do highlight a couple of things. I mean, you know, you called out Andre, this idea that your scaling had been a lot more intense previously, compute scaling and may have flattened out a little bit. And the may there is important, you know, they themselves include a lot of uncertainty estimates and assessment saying, you know, it's hard to know exactly when when one era of scaling ended and the other the other kicked off.

But what does seem to happen is around 2020, you have what seems like a, you know, I mean, call it a slowdown. It's still five x per year, right? That means that compounds every year. So in two years it's 25 x right. In in three years it's 125 x. So these models in terms of compute expenditure are growing outrageously fast. It's just not maybe the seven x per year that it was before then. they speculate about why that might be.

And they say that although there's not enough information to be confident about what caused the kind of, again, call it a slowdown, it's still a radical like acceleration of, of this stuff. But but you know, the five x one of the explanations they came up with was this idea that in the early days, there was a sort of compute overhang. There was so much straight compute lying around. People weren't really wrestling each other to get their hands on GPUs. They weren't crazy expensive.

And so if you wanted to scale up more, you could kind of pretty easily just go to the next lab over and say, hey, can we use your, you know, your GPUs to try something at a bigger scale? That's obviously a caricature of what actually would happen, but, you know, a whole lot easier to get your hands on on that compute. Whereas you get to 2020, GPT three comes out. They correctly, at least in my opinion, identified as a major turning point in the scaling story for a lot of people.

This is when we started paying attention to generative AI and AGI. And so at that point, all of a sudden everybody's trying to get their hands on GPUs, the market competition kicks in and it just you start to be limited by essentially the the rate of, like the rate of decrease in the cost of, of compute rather than just like your ability to reach out and grab really readily available devices.

Last thing I'll mention. You know, you called out this idea that so so they compare different companies, in particular meta, OpenAI and Google. They look at the scaling trends in compute for those companies. Google and OpenAI look very similar, very consistent 5XA year starting in 2012 or 2016, in the case of OpenAI and continuing, on an ongoing basis, whereas meta looks a bit sharper. They're at 7XA year.

And while though that makes it sound like, wow, you know, meta is, scaling more aggressively, I think the most appropriate interpretation is they're playing desperate catch up now, because for a long time, Yann LeCun was of the view that, like, we weren't really close to AGI. You know, we're not going to, you know, double down on the scaling thing, whereas OpenAI is saying, no, no, like scaling is the path to AGI. And Google institutionally seems to have a belief that's pretty similar to that.

Andrey

You know, there used to be a time where you could fit an entire, deep learning model on one GPU, and I miss was days. Yeah, yeah, yeah. I mean, you can still do it with, quantized smaller models, but not for frontier models.

Jeremie

It's not the same, though.

Andrey

Yeah. It used to be, researchers had a desktop with a GPU, and you could run your experiments anyway. Yeah, I will say, coming to the defense of Yann LeCun a meta a little bit. In some sense, they did follow, like scaling. Everyone knew it was the path to go even before Gpt3. Right. We knew more data, bigger models. It was a known strategy. And it was the trend since AlexNet, and meta did play a role in that.

In particular, they had massive. Some data sets for training, segmentation, image models, etc.. Where they did lag was really adopting large language models and moving towards outrageously large things. And to be fair, again, I think it wasn't in the popular consciousness in the research world as as soon as Gpt3 hit and and scaling was hit, maybe we didn't understand the full extent of implications, but certainly people did see that this was a big deal.

But, you know, something that's justifying a $10 million training round was probably still not easy in these businesses and organizations.

Jeremie

I think that's the thing. Right? That is the test of whether you genuinely have conviction. OpenAI I put forward that that, those resources. Right.

Andrey

Yeah. We're not the negotiators. You know, he was right on deep learning. And he was early. So pioneered tons of good research papers to. Yeah. So definitely great but sometimes wrong.

Jeremie

And I just I just like to call to call people out and not get called out myself. Andre. That's what I like to do on the show that that's what this is.

Andrey

Onto the lightning round. First story gzip predicts data dependent scaling laws. So what they're saying essentially is you can sort of tell the scaling laws you'll see of data just based on seeing how compressible the data is without even training for neural nets. If you just kind of analyze your data, and they show this across six datasets of varying complexity, and these different data sets have different compressibility based

on that complexity. And they then find that on each of these, when you do, actually do this, that corresponds to what you observe with scaling laws when you do train neural nets. So very interesting result and perhaps very useful result.

Jeremie

Yeah. This was for for me personally this was like the paper of the week. It just in terms of how interesting it was and it was written by one guy, by the way, is like like one author paper. So it's kind of a rare treat to see one of those. The big question is, you know, when you're going to scale up the amount of compute that you're going to throw at a model, so you have a certain compute budget, you got to decide how big do you want to

make your model, right. Because the larger your model is, the more moving parts it has that you have to tweak, the more parameters you have to tweak per, let's say per, weight update. Right. So you're going to invest more computing power into updating your models weights for each data point that you feed it. Versus should I just increase the number of data points? Should I use a small model but use way more data?

So do more cycles of weight updates where each cycle is just updating fewer weights because the model smaller? That's the big question. And for a long time people thought that like the answer to that question, the the ratio, the optimal ratio of how much do you how big do you make the model versus how large do you make the data set? People thought that you should just scale them together 1 to 1 roughly. That's sort of the chinchilla scaling law result that we

got from, I don't know, 2021 or something. Google DeepMind, what this paper says is, well, wait a minute, that may actually not be independent, that may be dependent on the quality of the data, the nature of the data, in particular how dense that data is. If you have a data set that is super packed with like loads of of information per token. So think you're about code, right? There's way more information per token than in just a poem or something, right?

Some languages are more efficient in terms of how much information they pack into small sentences. Right. So the more dense the, let's say less compressible your data is, the more complex it is, the less you can get rid of fluff. It turns out the more you want to lean towards data instead of model size as you scale your model. And I was thinking a lot about why that might be.

The two reasons I could come up with were, you know, maybe it's for highly complex data for, for, you know, very sort of information, efficient data. You need like a lot of, let's say, well, let me flip it around. If you've got language with a lot of fluff, a lot of unnecessary grammar rules like longer sentences, and they should be for the information they contain, you need to kind of store those stupid grammar rules somewhere, and they turn out to have to be stored in

the model weights. You need more model weights, larger model, whereas if you have more compact, more efficient models of the sort of data, data that's more information dense, then you need more data to train the model, because you want more diverse training examples to capture all the nuances in that data. So just to kind of, I guess, back of envelope thought there, I thought, this is really cool. Whole bunch of very applicable results here. They show they can reach the performance of.

With 24% fewer flops, 24% less compute. Basically. Using their compute optimal scaling. So I thought that this was really cool. Check it out if you're interested in this kind of stuff.

Andrey

Yeah. Super interesting. And it also kind of plays into theories of intelligence, I think, where you have said often that, you know, you can basically say intelligence is compression. Certainly neural nets, what they're doing is compression in some sense. So I also agree this is very cool. Next paper we are talking about robotics which we don't touch on too often, but I am fond of. So the paper is neural scaling laws for embodied

AI. And they're talking about robot foundation models, which is let's say not quite as big as other foundation models. For many reasons. It's hard to get a lot of data, and so on. But they are an emerging field of research. DeepMind has published some research on foundation models for, let's say, manipulation. For instance, there's been recent efforts to scale up data sets and so on. And this paper is looking at basically years of efforts on robot foundation.

Models may find 24 robot foundation models, scaling laws on a study on six tasks and four methods. So not too surprising, but, it seems that there is variation there. It's not, quite the same across domains, and certainly not the same as image generation of various exercises. So in some sense, you can scale up in robotics and keep getting better results, which we already suspected. And again, this was already the case in 2016 and so on that people were going there.

But now we have a slightly more scientific understanding of it, which is pretty important.

Jeremie

Yeah, I think it's like it seeing new scaling laws is really interesting. To the the discussion we were having, just the tail end, the point you brought up about, you know, how fundamental like theories of learning that this brings up, right, when you see scaling laws that apply in a new domain like robotics pretty robustly as they seem to here, kind of makes you think, yeah. Like, you know, it's almost like a principle of, I don't know,

principle of physics. Right? Like the more data, the more compute you plow into a model, the more performant it is, at least by some kind of figure of merit. So.

Yeah. Interesting. Interesting result. Of course, one of the implications of scaling laws in robotics, or the implications of scaling laws in robotics can be a little different from, say, language or other kind of natively digital contexts, because often in robotic systems, but not always, you want to actually be able to deploy, at the edge, right? You want to be able to do on device deployments. And so you end up being constrained by just different variables.

Model size becomes a bigger variable constraint than necessarily things like data compute. So you might find that yeah, you know, we've got a $10 million budget. But for that budget, we could afford to train a much bigger model than we can actually fit on our edge device. And so you end up doing, you know, data optimal training or, you know, other things. So, kind of, yeah, kind of kind of interesting to see these laws, laid out again and hopefully more research like this. Fourth.

Andrey

Next contextual position encoding, learning to count what's important. Another slightly nerdy topic. And that is very exciting if you're working on language models and foundation models. So real quick attention is very important for swimmers. And part of how you do that is when you encode the inputs to your model, you add on position and codex to tell you where in the sentence for instance or other place you are, because otherwise transformers are invariant to order.

So we don't know. Like this is the beginning of a sentence. This is the end of a sentence. And typically there's a lot of ways to encode position. But roughly speaking, you know, you want to say this is earlier versus later. And the way are independent of context. They are independent, really have the content, so to speak. And this people are saying that you can do a different way of positioning that is contextual, and that means incrementing position only on certain tokens that are determined

by the model. So you can attend, let's say, to the if particular word now in a sentence. And if you do that, you enable the model to work on tasks that otherwise have a hard for it, like selective copying, counting and flipping tasks. And they also show that, this improves perplexity on language model and coding task basically makes things work better. So. Pretty exciting.

Jeremie

Yeah. This is, you know, after having just ripped, Yann LeCun and Meta for their AI strategy. This is one where, you know, I flagged this paper because I think it's a great example of the kind of research meta can do. You know, it's a classic example where they they'll look at the transformer architecture, say, or, you know, whatever architecture, but these days, transformer. And they will look at what are these structural, let's say limitations or flaws of these

models. Right. What are things that no matter how much you scale this model, you should expect it to struggle with certain things. And the fact that these models, don't have the ability to, like, natively count anything other than tokens like the position encoding in these models, as you said, is it basically, you know, which token is the first? Which is the second, which is the third? Well, tokens can be

anything. And in particular, often when you look at things like, you know, byte pair encoding, tokenization, you have like something like words, syllables are a token or components of words or token. And so it's not always the case that if you count your tokens, you're counting your words. And that can make it challenging for the model to natively count. Say the occurrence of words of full words in a sentence, or sentences or things like that.

So if you want sentence level reasoning, if you want paragraph or whatever other level of abstraction, level of reasoning that relies on the model actually knowing where it is, let's say in a particular, in a particular body of text at the at the sentence level, if the word level, you need to find a way to have this sort of more contextual positional encoding. Right. And that's exactly why this is called contextual positional encoding or or Cop. I thought that was a funny acronym.

Yeah. Apparently performs better on coding tasks as well. You can sort of see why that might be. Right. The kind of logical requirement surfaces in that context, where you want to know where one variable ends and another begins. Maybe you want to be able to count the occurrence of things in, body of code because it has implications for logical reasoning.

But, again, this is one of those very elegant and simple things that meta can sometimes put out where they'll show you a failure mode of language models, and you kind of go like, you know, like it looks so dumb, and yet it's, it's this beautiful sort of. It reminds me of some of the best physics theory research that you can see, where somebody just shows something so simple but so

powerful. And in this case, I thought one of the best examples was, you know, they showed GPT four fail, on a task where they're like, just count the number of times that the name Alice appears in the sentence, and it couldn't do it. Right. So, that problem, of course, was solved with their, their, positional encoding strategy. But, you know, I thought it was a really interesting paper. And, hats off to meta for this result.

Andrey

Yep. Now you want to use cop instead of rope, right? And the last story for this section new AI products, much hyped but not much used according to a new study. So this is a survey of 12,000 people in six countries. And it finds, you know, as per title that there's a lot of hype, but they haven't quite reached this mass appeal stage. So only, 2% of British respondents, for instance, say that they use such tools daily.

In general, if you just look at people that know I've heard of these things, like ChatGPT is at 58% in the UK, 53 in the USA. All of these other AI models are lagging quite a bit behind Geminis at 24% in the US, 50 in the UK, Microsoft Copilot is in the 20s and then you go down, you know, they get to 7% perplexity that I, for instance, very much hype at 3% in the US as far as attention. So yeah, I guess not too surprising. You know, it takes time for people to adapt and be aware of the new

technology. There is maybe a sense if you're deep into the stuff that everyone knows about it, and it's not true. As we've covered before, I will say I challenge the, kind of what this is saying, that there's a lot of hype, but not a lot of actual impact necessarily. Yet. I mean, this has been the case with technology throughout time. You know, you could say the same about the internet, about smartphones. You know, it doesn't it's going to take time to

penetrate. But it is, I believe, the case that everyone is going to use AI soon enough. So that's not take this to mean that the hype is not warranted.

Jeremie

Yeah. I mean, when you look at it exponential, right. Like obviously like you're either too early or too late, you're either going to look like you're cause, calling the trend way before it's a thing or you're going to be sitting on, you know, the other side of the internet bubble in like 2015 calling the internet bubble or not the bubble, but calling the internet phenomenon. So, yeah, I think that's the case here.

You know, I'm I'm old enough to remember that the idea of like 2% of British respondents saying. They use these tools on a freaking daily basis. Would have been insane. Like a daily basis. That is all. That's a lot. Man, that's a lot of usage. That's about where I'm at, obviously. Like, you know, we're junkies. I actually I use it many, many times a day. But but still, like, this is early days. You're talking about a market that has.

Let's not forget a giant chunk of people who are over over 60, over 50 who don't necessarily think of technology as, you know, part of what they need to keep up with and necessarily dabble. There's a, you know, a famous, tech book called Crossing the Chasm for business book that kind of goes into this phenomenon or like, you know, yeah, it takes a long time to kind of break through to each new level of user.

But to your point about impact, I don't think that the metric that matters for impact is how many people actually use the services. It's how much value is being created by the services. Right. And for that like also. Yeah, just I mean, think about every time you use Google and you get a summary, you're consuming the output of a generative AI system.

It may not be a chat bot per se, but it is something that is very much inspired and literally based on the breakthroughs we've seen, like GPT two, GPT three, and so on. So yeah, I think it's like these claims are, it's fun to say things are overhyped. It makes people sound, you know, very mature and all that. But, I agree with you. I mean, I think overall it's an interesting stat. It's definitely like it's less than I would have expected.

But it is significant and, and not necessarily the right metric to be tracking if you care about impact.

Andrey

Onto policy and safety. And we start with something a little bit spicy. Once again, we are talking about OpenAI drama, one of the stories that has come out this past week. And the title is ex OpenAI board member reveals what led to Sam Hartman's brief ousting. And we've covered a lot of his ousting. You know, to recap, Tim Hartman was removed by the board as CEO last November. Came back after like a week or something where that was a very exciting time.

And one of the people on the board who was on the side that pushed Altman out is Helen Toner. Who is that safe? More from, policy safety background less on the business side. And yes, in a new interview, she spoke and provided more details about what happened. So, for instance, she said that the board only learned of ChatGPT on Twitter and they didn't get their heads up, said that, and multiple cases, there was lack of transparency regarding safety processes and some even misleading

information. We sort of knew a lot of this already. I won't say that this is revealing what led to Sam Altman's ousting. We knew the general kind of story here, but certainly these are, dramatic new revelations with a bit more detail than we had before.

Jeremie

Yeah. And some of the detail I think is, is new and more specific. Right. So one of the things that, Helen Toner highlighted here in so there are a couple things. There's a big podcast that she did with that, I guess the Ted organization. And then there was, there was this economist article.

And one of the things that she highlighted is, you know, they they the board apparently believed that Altman cultivated a, quote, toxic culture of lying and engaged in quotes, behavior that can be characterized as psychological abuse. She is not the first person to make this kind of allegation. We've heard from, various people in including, I think, one person who took text. I'm trying to remember his name as he's, like, famous

in the community. But anyway, who's saying, you know, I've, I've dealt with Sam, for a long time. He's always been really nice to me, but I do know and have have seen multiple instances where he has lied to me and to others. Sam seems to be somebody who, based on these reports, does have

a lot of trouble with the truth. And that's concerning, when you look at a guy who's running an organization with a mission as important as opening eyes, certainly, that hasn't been helped by the revelations we've had. Surface, and we'll talk about this. But, you know, OpenAI, institutionally, it seems silencing whistleblowers and former employees is certainly something that my team has been aware of for a long time.

As we've talked to whistleblowers at, the various frontier labs, you know, hearing about the the challenges that people have in making the decision to come forward, this is something that OpenAI has, does seem to have an institutional culture of secrecy, which contradicts, fairly significantly, their public, sort of messaging around how well, how open they are. And I think that's, you know, not great for accountability.

You know, Helen called out here as well that there were, questions about security practices, safety practices, that. You know, they just weren't getting accurate or complete reporting. She seemed to suggest from Sam and that the board just couldn't do their job and oversee their work.

So interesting. You know, this sort of thing is much more in the orbit, of course, at the time, right when opening, when Sam was first let go, we talked about it on the podcast and I said, well, you know, this looks sort of like a he call it a somewhat botched firing because we don't know the reasons yet. Now, having seen the reasoning starts to make a little bit more

sense. I do think that this statement about how, you know, they were taken by surprise by ChatGPT, like, I don't know, to me, that doesn't seem as as much of an indictment as it might sound. The reality is, at least the public story that OpenAI tells us, and that Sam Altman has been very consistent on from the start was, yeah, you know, we launch products all the time. I didn't think much of ChatGPT. He did say I expected it to be successful or did you know, to work

well. But they definitely were surprised at how successful it was. And you wouldn't expect the board to be notified from like every, you know, every PR that's that's pushed up. So there's a bit of a, a bit of a continuum there of, you know, what is to be expected. There has been a rebuttal to, in The Economist published by Opening Eye's current board chief, Brett Taylor, and he basically says, we're disappointed that, Helen Toner continues to revisit these issues.

And he says that all kinds of things, he kind of goes through the process, the story as we know it, you know, hey, we had an independent review, the concluded that the board's prior decision was not based on concerns regarding product safety or security, the pace of development, OpenAI AIS finances, or its statements to investors, customers or business partners. Okay, there are a lot of asterisks in that. We could we could get into and maybe don't have time for.

But I think ultimately, it does not address the question of the board. Was the board kept up to date on these issues or. No, there's a lot of very careful language here that, frankly, mirrors all too closely the very careful language that we've seen OpenAI deploy in the past, including in the context of these sort of whistleblower employment agreement, things that have caused so much trouble in the last week. So, you know, I think, this

is this is tricky. Again, these are serious allegations, right? Helen is coming out and saying on multiple occasions he meaning Sam Altman, gave us inaccurate information about the formal safety processes that the company did have in place, meaning that it was basically impossible for the board to know how well those safety processes were working or what might need to change.

So, you know, again, for a company with, as they themselves, as Sam Altman himself puts it, is is on track to change, you know, if if they're right, the face of humanity as we know it, more openness is simply required. I can tell you that OpenAI employees themselves still don't know the full story. There there are a lot of problems there. If there's not openness, even at the level of the employees in the

company itself. So, the shiny luster may be coming off, coming off the, the company a little bit. And I think until we have answers to these questions, perhaps rightly so, perhaps people should be asking themselves fundamental questions about the governance implications of, of a lot of this behavior.

Andrey

Right. And this is coming, out just following the Scarlett Johansson, episode from last week where there was a similar story where, you know, OpenAI, I was like, well, we didn't intend or it wasn't meant to be based on Scarlett Johansson, but Sam Altman did tweet her and it was revealed that he personally lobbied for this. So not exactly lying, but not exactly being fully transparent necessarily. And I think more than anything, these things are kind of shaping people's perspectives.

Sam Altman in particular, let's say OpenAI in general, but Sam Altman in particular, and it does kind of expand on a general understanding of Sam Altman. So, for instance, at this first company, what I read was, you know, employees came to a board asking him to be ousted for, let's say, similar reasons. And it's worth noting that he was lead of Y Combinator. So he's very much in this Silicon Valley startup environment.

And one of the ethos of that world in general is to be kind of rule breakers, to fake it til you make it. You know, if you know how startups work, you may not always be fully honest in your quest to succeed.

Jeremie

So I will say, you know, having having gone to Y Combinator, like the, the ethos at C is actually very sort of like oriented towards being a positive force in the world and in building to build. And to the extent that you're breaking rules, it's because you assess those rules are, sort of poorly developed, are not kind of in the interest of the wider world in some amount of some sort of rule breaking is necessary in life. I think the challenge with Sam is the extent to which this may

be. Pathological and the extent to which, given the context, right, like OpenAI is a very powerful company and they become the most powerful company. If Sam Altman's own thesis turns out to be true. And if that's the case, you just can't be having these sorts of questions. And again, like, you just rattled a bunch of Scarlett Johansson.

We got this. We had literally a few days before, this news that apparently, Sir John, like, the former head of alignment at OpenAI, left in protest, saying, look, Sam promised us 20% of the compute budget that OpenAI had raised as of some date in 2023 to help with Super line that they did not deliver on that promise. You know, these are very substantial. These speak to the core mission of OpenAI that apparently is not being upheld by Sam Altman and by the board and by the executives.

These are really deep issues, and I think we are not even close to having the answers that we need on any of these questions. It would be great to have a little bit more clarity and transparency and, frankly, accountability. Ultimately, if if these concerns turn out to be valid. And that's something we we can't find out without more more transparency.

Andrey

Yeah. So there you go. Once again we are shilling for AI. And speaking of John Lackey, the next story is opening AI researcher who resigned over safety concerns joins on from Pic. So he went from one of a leading companies to another. And actually it's funny and Tropic was a spin out from OpenAI initially like the founding team was former OpenAI, employees. So in some sense this is appropriate.

And what, you know, is like he will be working on essentially similar things on this kind of super alignment, idea of aligning AGI and particular alignment more for the long term of super models. Less so for today's models. So yeah, not too much more to a story, but I think very significant. And anthropic does position itself and I think fairly so as more safety concerned and more focused on safety than OpenAI. So this is yet adding to it for sure.

Jeremie

Yeah. For sure. And this is consistent with, you know, what young like you said when he, left OpenAI again in protest at that time, he posted in a tweet, he said, we're long overdue and getting incredibly serious about the implications of AGI. We must prioritize preparing for them as best we can. So he's looking to continue his work on, as he put it, scalable

oversight. So basically having essentially you think of this as like smaller models oversee, in a reliable way, larger models, weak to strong generalization, and automated alignment research, which had been a key pillar of OpenAI strategy that kept saying, look, we're going to we're going to use, weaker systems to automate the process of doing alignment research to help us align increasingly powerful systems. A lot of people have criticized that strategy.

It to me, it seems like, you know, it's it's one of the strategies that could actually work. I think, it's it's incredibly risky if you if you take the risk seriously. But, the which which some may not for various interesting and potentially good reasons, but, still, I think it's. Yeah, it's a risky piece. He's moving over to anthropic. They've got a, as you said, an amazing track record on things like

interpretability in particular. So it'll be interesting to see those efforts pair to the scalable oversight. And we just wrong generalization stuff. that like us doing not mentioned here is Ilya Sutskever who also left OpenAI. Interestingly, exactly six months, it seemed, after, the attempt to oust Sam Altman in return. So kind of makes you wonder, you know, if there is some, some provision for a period of notice that he had to give or something,

which is why we hadn't heard from him. We still don't know where's Ilya? We, as a meme goes, we still don't know, what he is doing or what project he might be working on.

Andrey

And after Lightning Round, where we'll try to be a little quicker. But as you said, there's a lot going on this week. And, you know, we just got finished. Let's say.

Jeremie

Oh, Jesus.

Andrey

Criticizing Sam Altman. But unfortunately we got to go back to it with this next story. So the story is that the leaked OpenAI documents show Sam Altman was clearly aware of silencing former employees. You've talked about how this was one of the controversies that emerged with regards to the safety team. One of our employees, came out on Twitter saying that they gave up. They didn't want to sign an NDA.

And in the agreement, when you leave a company, apparently you had to sign, very strict requirements not to talk negatively about OpenAI, not to disclose various things, or OpenAI could claw back the equity, the ownership you had in OpenAI, amounting to a lot of the money you made with the company a huge amount of value. Right. And so some Altman, you know. My on. That was just a fad. Someone at Twitter saying, oh, no, I was unaware of this.

But we haven't clawed back any equity, and we gonna, like, go out and say that we're going to change the system. While the story says that while he signed off on the policy, you know, so seems like you knew about it. I don't know.

Jeremie

Yeah. You know, man, I think the headline here is maybe a little ungenerous, you know, saying he was clearly aware. The reality is his signature was on a series of corporate documents that established this practice. Right? That formalized it. Okay. That's serious. You really should be reading the corporate documents that you sign, especially if you are the CEO of this company, and especially given your company's commitment and your own personal commitments to openness and transparency.

Absolutely true. That being said, does everybody read all the fine print? No. That being said, this is really fine print that, you know, if I'm a betting man, I am putting my money on Sam. And you knew this was a thing. We have other I have other reasons to suspect that, but, but long and the short of it is, you know, this doesn't look great. And you know this claim as well, that, there have not been any clawbacks on equity.

I, I understand from talking to some folks, who have experienced the process of departing that, that this is maybe technically true, but in practice, not necessarily true. That is my understanding of this. Again, this is one of those things that kind of, causes me to raise an eyebrow, looking at the ease with which Sam Altman seems to put out these statements. We just heard an awful lot of this, like, technically true, but actually stuff. And. And this doesn't look great.

I mean, you know, like, the list just keeps getting longer, and, and now, you know, here we are with something that's really high stakes. And OpenAI, arguably now, it's only after there was that big Vox piece that came out calling them out on this practice.

You know, there's all kinds of stuff. If you actually look at the interactions to between the OpenAI employees who are trying to depart with their equity and, the OpenAI people that are just outrageous, I mean, the OpenAI folks telling them things like, look, we're just doing this by the book.

It's, you know, standard procedure or whatever for us to include these very, very aggressive non disparagement clauses, basically saying for the rest of your life, you can't criticize OpenAI at the cost of losing your equity. That by the way, for vested equity, not normal at all, despite the apparent characterization that, OpenAI institutionally

had been offering. So a lot of big problems here, a lot of questions that I think are rightly being raised, across the board when it comes to OpenAI, OpenAI and its governance here. So hopefully this leads to some positive changes and more transparency.

Andrey

Yeah. And if something else, maybe he didn't know the fine print, but the overall strategy for overall like approach to this. Yeah he is a CEO you know. So yeah. And clearly now we will get comments saying we are open to haters. So there you go.

Jeremie

Well I've been a fan of them for a long time. You know that's the sad thing is, is I genuinely have been an OpenAI fanboy for forever and, believe that they were, you know, operating with, with the best of intentions. And I think that may have been the case. I do think that that may well have changed, unfortunately, and I've had to update quite a bit just based on talking to people, talking to people who've left, talking to people in the ecosystem, to

whistleblowers. This is not the same company that it was when it was making a lot of its most lofty and ambitious commitments to safety and responsibility. And, and that's unfortunate. And I think that unfortunately, it also means self-regulation, is no longer really on the table as long as you have companies. That could be the way OpenAI could be. Like, you can no longer just count on people, you know, doing their

own grading, their own homework, so to speak. So this is really unfortunate on a lot of levels.

Andrey

Yeah. And I think that's true even if you just look at research output, you know, OpenAI used to put out more and now it's more entropic. Yeah. And one more story about OpenAI. There's a lot of squeezes. So there's a new safety and security committee, which is led by directors Brad Taylor, Adam D'Angelo, Michael Seligman and CEO Sam Altman. And so the committee will be responsible for making recommendations on safety and security.

This is presumably in response to some of these things like realignment team. And the first task will be to evaluate and develop OpenAI processes and safeguards over the next 90 days. Afterwards, we all share recommendations to a full board, and open AI will publicly share an update on the adopted recommendations. I've seen some criticism of this. As you know, the board is full of.

Ciders. They may not be quite, let's say, challenging enough challenge for commercial interests of the company as Sam Altman is on the board. So I guess let's not keep waiting and hoping I too much. But that's how it looks.

Jeremie

It just seems it just seems like a bit of a trend. Like every time there is a corporate governance shuffle, it's like Sam Altman ends up on the committee that's deciding what to do about it. So, you know, you might argue that's not the most constructive and healthy form of oversight, especially given the leaks that we've been hearing. It's also you got it. You got to respect the game from OpenAI on the PR side here, they're basically saying like, hey, world. We're like all the scandals.

Yeah, whatever. We're setting up a new safety and security committee. The super climate team is totally like, the leadership is totally just left us. Yeah, yeah. Boris set up this new committee. By the way, we're training GPT five. Anyway, so the committee is going to be set up like this. They just dropped this line. OpenAI has recently begun training its next frontier model, and we anticipate the resulting system to bring us to the next level of capabilities on our path to AGI.

So, by the way, I think recently there is doing a is a very load bearing word. I suspect the, you know, model has been in train for, for quite some time. But in any case, this is clearly a clearly I shouldn't say clearly. This is quite possibly, kind of PR response to, you know, understandably, all the brouhaha. You might do this even if you were well-intentioned. Right. So this is just kind of they're trying to pick up the pieces of their bring on board some, some great people, you know, to to

join this committee as well. Yes. We can rip on them for having Sanjay there, especially in light of the criticism. But, they have a bunch of, good policy experts. They've like John Shulman, who's the head of alignment science, kind of taking over for John, like after his departure. And, and, Jacob, who is their chief scientist? So, you know, that's good. But you got to wonder with what we've seen with the shake ups at OpenAI, is there a credible governance structure?

Is accountability? Is transparency actually going to happen, or is Sam Altman's personality just going to like, run, you know, run over all the objections that are raised so hard to know. Brett Taylor, very vocal. Sam Altman, advocate. And in fact, the guy who wrote the rebuttal to Helen Toner and, and Tasha McCauley that we, we read earlier, he will be chairing this committee. So kind of, you know, when we talk about a friendly committee to Sam Altman, that's sort of where that's coming from.

Andrey

Right? And, well, we've crossed the two hour mark, which we have done in quite a while. So we're going to try and wrap up. But, hopefully a few of us are still with us because I think was open air stories are pretty notable and worth knowing about. Yeah. And the next story is a bit less of a big deal. So it's about the person who cloned Biden's voice and made these robocalls, I believe, at a time saying, trying to tell people not to go out and vote in the primary.

Well, now we know that, that person was fined $6 million from the Federal Communications Commission. And this is a pretty important thing. This is apparently the maximum limit, and the actual paid could be significantly less. But, it's a bit of a precedent setting move, right? Because we haven't seen too much of this kind of deepfake, really, action happening and significant, responses. So I think very important as we add into the next few months of the US election.

Jeremie

Yeah, absolutely. And it's it's interesting. It's it's sort of trying to set a bit of a precedent maybe. And, you know, penalizing at the higher end of what's possible, just given the stakes. Sort of a interesting choice for the the system to move in next.

Andrey

Hacker releases jailbroken that God mode version, version of ChatGPT. And that what jailbreak means is things you usually shouldn't be able to ask for, like how to make drugs. Now, you'll be able to do that. Apparently this is using something like these. Spec leads speak a language that replaces certain letters of numbers. Various methods exist for jailbreaking, so it seems to be using some of those that is not able to combat yet. We can expect OpenAI to try and block those efforts. Well.

Jeremie

Yeah. This jailbreak comes to us by way of one of my favorite Twitter followers, which is Pliny the Prompt. Or he always comes out with, really cool jailbreaks and things, leet speak. By the way, is this disappointingly stupid, simple way of modifying your, request your prompt. So replace every instance of the letter E with the number three. Replace every instance of the letter O with the number zero, for example. And you kind of get what the what that vibe looks like.

It doesn't work anymore. So this is a GPT that he deployed. And for a short period of time people could use it's been taken down. Surely this this vulnerability is going to be all patched up, but it's another example of like, just how dead simple and embarrassingly simple some of the jailbreaks can be. This is a symptom of how hard it is to get AI systems to do what you want them to do, and not what you're training them to do. Right? That idea of the alignment, the the difficulty of aligning

these models. So, yeah, kind of interesting. I tried clicking on the link. It does not work any longer, unfortunately, or at least as of the time that I tried. So, no, no God mode for us today.

Andrey

And while I was in China, creates 47.5 billion that ship fund to back nations firms. This is the third phase of a national integrated circuit industry investment fund. And that means that they have ¥344 billion. That's 47.5 billion USD, which is coming from the central government and state owned banks and enterprises. That'll back, get semiconductor and, compute efforts.

Jeremie

Yeah. You can think of this as China's answer to the Biden administration's chip to chips act, right. The the Chips and Science Act, which was 39 billion, for chip makers. But then there was an additional 75 billion in loans and guarantees. So you're talking about here something in the same orbit. You know, China obviously trying to achieve semiconductor self-sufficiency for all kinds of strategic reasons. Turns out the largest shareholder in this fund is going to be

China's Ministry of Finance. So you're very much looking at, the classic sort of Chinese, PRC central planning, game here. So, you know, that maybe in a bit of contrast to the approach that's, that's been taken in the States there, it looks like they're actually going to be, owning some, some, some equity in the outcomes. But yeah, really interesting. Beijing struggled with getting these things to pay off.

By the way, we've talked about this before, but lots of corruption, lots of fraud as the, capital just flows into this ecosystem. And that's just the cost for them of doing business. They they need to burn some of that cash to take bets on a large number of potential semiconductor companies to let them achieve what they're trying to achieve.

Andrey

And on to the next section, synthetic media and art. We only have one story. So this is the last story of this big long episode. And it is. Alphabet and meta are for millions to partner with Hollywood on a yeah. So for instance, Warner Bros. Discovery Inc has expressed a willingness to license some of its programs to train, video models. It sounds like not everything.

While Disney and Netflix aren't willing to license their content to these companies, but have expressed an interest in other types of collaborations. So this is a bit of an overview article, and I also found it interesting that it notes that no major studio has so far sued a tech company over AI. So these Hollywood companies, unlike, let's say, music companies, are maybe taking a more friendly stance in general.

Jeremie

Yeah, yeah. I'm really curious how this, shakes out. There's so much money it's taken. And one of the challenges they face to right, is just like the democratization of the ability to generate Hollywood quality, video. Right. I mean, it's it's not impossible that, you know, video production gets like Midjourney fired in that sense. So we get Midjourney for video, or stability for video,

in a very reliable way. Sometime in the coming, I would say coming years, coming months, maybe even, and at that point, you know, you're really leveling the playing field between the insane production costs in Hollywood, which is obviously why they're. They're interested in this tech in the first place. And, and then the, indie developers, indie, I guess producers, if you will, and not a Hollywood guy, but, you know, that

that population. So, so this could signal a structural shift to long term in the way that, movies and content is, generated and consumed.

Andrey

And with that, we are done with this episode of last week and I again, I'll mention the newsletter at last weekend that I. And yeah, this was a good episode to have you back, Jeremy. Lots to talk about and lots of cool drama, let's say.

Jeremie

Yeah. No shortage.

Andrey

So as always, I will mention that if somehow you're not subscribed, we would appreciate it. In particular, if you head over to the YouTube channel, it's called Last week and I might be helpful to subscribe and like, stuff, I don't know, algorithms rule all these systems. And speaking of algorithms, if you do stop by and give us an overview on Apple Podcasts or Spotify that will help us reach more listeners. So we will like that review and any feedback you've got.

But more than anything, we do enjoy people listening, so please do keep tuning in and enjoy the AI generated song that will now start.

Jeremie

Ooh!

Unidentified

Oh, hey. Hi. Oh, wow. Oh, hey. Hi. Oh, wow. Oh! Jeremy. Marisol. Hey. Hey, now. Last week. Last week. News insights. Minds as well.

Jeremie

Oh, breakthroughs can't slow down.

Unidentified

Jeremy back with flair. This week. This week I do feel like.

Transcript source: Provided by creator in RSS feed: download file