#184 - OpenAI's Voice 2.0 + execs quitting, Llama 3.2, To CoT or not to CoT? - podcast episode cover

#184 - OpenAI's Voice 2.0 + execs quitting, Llama 3.2, To CoT or not to CoT?

Oct 02, 20242 hr 30 minEp. 223
--:--
--:--
Listen in podcast apps:

Episode description

Our 184th episode with a summary and discussion of last week's big AI news! With hosts Andrey Kurenkov and guest host Jon Krohn.

Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.

If you would like to become a sponsor for the newsletter, podcast, or both, please fill out this form.

Email us your questions and feedback at [email protected] and/or [email protected]

In this episode:

  • OpenAI, Meta, and Google are enhancing their AI assistants with advanced voice modes, while Meta released Llama 3.2, an open-source model capable of processing both images and text.
  • Significant AI infrastructure developments include Grok's partnership with Aramco for a massive data center in Saudi Arabia, and Microsoft's plan to power data centers using a reopened Three Mile Island nuclear plant.
  • Recent research shows chain-of-thought prompting is most effective for math and symbolic reasoning, while OpenAI's GPT-4 with vision capabilities is being integrated into Perplexity AI's search platform.
  • AI is being rapidly integrated into various sectors, with examples including ChartWatch reducing unexpected hospital deaths, Snapchat and YouTube introducing AI video generation tools, and Lionsgate partnering with Runway for AI-assisted film production.

Timestamps + Links:

(01:28:01) Outro

Transcript

AI Singer

Welcome back to the A. I. scene. Where am I taking? This door is not lean. You get whispers of robots talking to us. Shifted open, you're causing quite a fuss.

Andrey

Hello and welcome to the last week in AI podcast where you can hear a chat about what's going on with ai. As usual, in this episode, we will summarize and discuss some of last week's most interesting AI news. And as always, you can go to lastweekin. ai for our text newsletter with even more AI news and also emails for the podcast with links to all the articles we discuss, which are also in the show notes. I am one of your hosts, Andrey Kurenkov.

I finished a PhD in AI from Stanford last year, and I now work at a generative AI startup. And as we've been saying the last couple episodes, Jeremy is busy with having a baby and not sleeping as of recently. So congrats to him, he'll be off for the foreseeable future. I'm not sure how long it takes to adjust to that, but presumably some time. Uh, so we do have a guest co host and it is once again, as he has done now

Jon

several times, uh, John Crone. Yeah. It's such an honor to be here. Thank you for inviting me back, Andre. I was really flattered. That in the most recent episode that was published, it said, I guess we're going to have John or one of our other regular co hosts back on. I was like, yes, they're calling me back in. And congratulations to Jeremy. That's a huge deal. Wow. I mean, it doesn't get bigger than having a baby. So I wonder if he's going to even have time to listen to this episode.

I doubt it. The congratulations is just Well, congratulations to any listener out there who's recently had a kid. I've, we're really rooting for you. Um, yeah. And so I guess my two line biography, um, I host the world's most listened to data science podcast. Um, it's called super data science and, uh, unlike the show. So it isn't a new show, really. It's, it's more of an interview show about two thirds of episodes have a guest. We do an episode every Tuesday and Friday.

And, uh, we also do sometimes on those Fridays, about half of the time, instead of having a guest, I kind of do a deep dive into a topic. So, um, a recent one, for example, I did a half hour on the O1 algorithm from open AI, because as you guys said, it's like, I think the quote from Jeremy was The release of the quarter, at least, and I absolutely agree. It's a big sensational thing.

I won't go into detail on that because you guys have already covered it, but uh, yeah, that's the kind of thing we do. I'm also co founder and chief data scientist at an AI company called Nebula, which is, yeah, automating white collar processes with AI. And yeah, I actually used something recently that happened. Andre is, I started doing some stuff on TV last year and I have a couple more things, which I can't publicly announce yet, but are in the works are going to happen very soon on TV.

It's pretty exciting.

Andrey

Wow. Sounds exciting. Yeah. And, uh, you've had what, eight? 800 more than 800 now,

Jon

right? Yeah, more than 800 episodes now. And we've had both you and Jeremy on the show. We'll see how quickly my, uh, spreadsheet can load so that I can find your episode. Yeah. So your episode is episode number 799. And, uh, that was a really great one. We got deep into.

Conversations about AGI and artificial superintelligence and how that's going to transform society in that episode, which was not, you know, that wasn't my intended really topics to cover, but we got deep into it and that incidentally, that is what at least one of these big TV projects is about. It's kind of, it's creating a show for the masses about how everything could dramatically change in the coming years as machines eclipse us with their intellectual capacity.

And yeah, Jeremy's been on the show a bunch of times as well. Uh, he was most recently on episode 545, it looks like. So people can check those out if they want to learn more about the great hosts of last week and AI.

Andrey

Yeah. Yeah. Those are pretty fun. You do get to hear a bit more background and whatnot here. We do try to keep it mostly to news or sometimes a bit of personal stuff sneaks in. I also, I misspoke. Jeremy's 565,

Jon

not 545.

Andrey

And as usual, before we get into the news, I do want to spend a bit of time acknowledging reviews and comments. Uh, we got a few more on Apple podcast. That is always nice. Uh, we have some, I like this one, uh, No idea how you get bad reviews on anything covered in this, you know, maybe I do think we have some flaws. Yeah, I don't know. It's nice to have some, uh, let's say constructive criticism, but thank you for that praise. Uh, got another one from Andrew, which I think is pretty fair.

I love it, but does not release consistently. That has certainly been true. Recently, I will try to look into hiring an editor because that is a major bottleneck, me having to do the edit and I have pretty, you know, specific requirements that I tried to get done. So it hasn't been, uh, quite as easy as, You would like it to be, but, uh, every way, like, let's hope, let's hope I managed to keep it a bit more consistent going

Jon

forward. It's a tricky chicken and egg kind of situation with these, with creating a podcast, because you're expected to be releasing material consistently, like your listeners expect it every week. Ideally, probably on like the same day or something, which is tough to have happen. But you have a full time job, Andre, and you do all of the editing. And so then to go out and hire an editor, you'd need to have, you know, consistent revenue.

But then getting consistent revenue, like sponsorship, that kind of thing, that's another time commitment, which would eat into your ability to edit and release episodes. So it's a tricky, tricky situation to get these things rolling.

Andrey

But I do find it pretty fun. It's always, it's a fun thing to have a spot cast. I guess that does drive me to do it. And just one more shout out before we get into the new stock. Have had a couple of comments on YouTube, which is always fun to see. And in particular, I think what is it? it. Tangalo commented on the last few in the most recent one actually commented on the intro and outro song.

Uh, actually commented that it got a low score, suggested like a type of theme for the next one, type of music. Which actually is helpful because every time I'm generating a song, I'm just like, well, what genre should I do this time? You know, what haven't I done yet? So if you do have any requests for type of music for the intro or outro, feel free to send an email, comment on YouTube, one of these things, and I will certainly probably give it a try.

What was your prompt for the most recent one? Was that like David Bowie or something? It was like six. 60s psychedelic or something synth y, but 60s rock. Uh, I, I usually try like a couple and that one got something interested, interesting going on. It's, I have so many like drafts of songs that are just discarded. All righty, enough prelude, let's get into the news and we begin with tools and apps where we have some kind of exciting developments in the emerging space of conversational AI.

So starting out with OpenAI, they are rolling out advanced voice mode with more voices and a new look. So we've had this voice mode thing where, you know, we got GPD 4. 0. Oh, Omni. Now, quite a while ago, and in the introduction to that model, kind of a big deal about it was that it was conversational. You could speak to it, not just via text, but via your voice. And it was almost real time, right?

It was, uh, kind of, Pretty groundbreaking, I would say, as far as conversational audio interaction with an AI. And so with this new update, which is rolling out to paying customers, so Plus and Teams tier, they'll be getting custom instructions, five new voices. They call them Arbor, Maple, Sol, Spruce, And they'll not sure what those voices are like, but, uh, hopefully maple is really like sticky and rich. Yeah. Yeah. Yeah. And so that means that there's now, uh, nine voices that you can use.

And, uh, some other stuff going in here, uh, getting a new design now represented by a blue animated sphere, which is, I guess, cooler if, and just like the little, uh, music or audio notes that they add, and, uh, that's about it. Uh, they've been kind of slow to roll out to this feature. And so it seems that they are now. Gathering enough data to improve accents, to be able to expand voices, and so on.

And, uh, it appears primarily available in the U. S. This is not available in the EU, the UK, Switzerland, various other

Jon

places. Yeah, this is the future. It's kind of obvious to say this is one of those mega trends moving away from having to be able to type. This allows so much more flexibility in how you interact with AI. At the time of recording, it was just a few days ago that Meta had a big announcement about a deeper integration with Luxottica on their Ray Bans and having integrated AI in those. And yeah, so that kind of AR experience or just auditory in, auditory out.

can be supported by much smaller devices. You don't need to have a screen open in front of you. You don't need to slow down whatever you're doing to use your fingers to type. And so this is obviously the future. And yes, this advanced voice mode from OpenAI is right at the cutting edge in terms of capabilities.

Andrey

Yeah, I feel a little embarrassed to not have tried it. Personally, because it's been kind of available, available for a little while now. So, uh, I guess that'll be homework for me after this episode to give it a try

Jon

and see what I feel. This is a bit of a tangent, but I know you guys go on tangent on the show, so I'm not gonna feel that bad about it. Andre, have you ever tried Apple Vision Pro? Uh, I did try it

Andrey

once and it was a bit of a hater. I, I prefer Oh really? I prefer vi meta pseudo product. Oh. I'm a big VR fan actually, so, oh, that, that was the reason. Yeah. But it is definitely cool. Yeah,

Jon

I loved it. I had a really like emotional experience in there. I was like, almost brought to tears. They, they, I went to an Apple store and did a demo and they.

end the demo with this really like gut wrenching video of beautiful things like soccer goals being scored right in front of you and you're like looking around the stadium and a woman tightrope walking and sharks and it just like they try to make like this really immersive and beautiful vision of the world and excitement and you're like wow it's so great to be alive it's it's amazing Even better with this headset on than being out in the world. Um, I don't know.

Anyway, I liked it a lot, but, uh, yeah, just kind of tangentially related to, you know, where if it's, it seems like this kind of thing, like augmented reality, like it's easy to imagine how in 10 years, maybe instead of having a bunch of screens around your desk, you can just have All of the space in whatever room you're in being occupied by screens. And I think the only real limitation right now is battery life because you can only do two hours on the Apple Vision Pro.

Um, and so you can like completely replace your office setup right now with that.

Andrey

Right. And, uh, another trend we're seeing that will be Closely tied to that is, is this conversational stuff is if you have advice on that is augmented reality or even, uh, not augmented reality, just like a smart glasses kind of feature, then you will have sort of always on ability to talk to AI assistants and whatnot. So that'll be very closely tied in.

Jon

If we do get that feature. Exactly. For sure. One last quick thing on this was that I said. I was like, why don't you guys have a power cable instead of having this be battery powered? Because then I could use this all day. And they actually said no, because part of the issue is overheating. Hmm. Um, so right now the device, it just, it probably couldn't run for like a full workday.

Andrey

Yeah. Well, first generation, so it's going to get better. Next up, quite a related story about Meta, their AI can now talk to you in the voices of Awkwafina, John Cena, and Judi Dench. So they have this AI chatbot on kind of a lot of their apps, Instagram, WhatsApp, and Facebook, and they've had various bots that you can talk to. And now you do get these celebrity voices in addition to non celebrity voices also named. Aspen, Atlas, and Clover.

I really like these non human names that are like, uh, coming from nature. And according to the Wall Street Journal, Meta is paying these celebrities millions of dollars for their voices. So again, uh, interesting to me, I feel they've tried this before already, prior to the expansion with Llama on the, all these platforms. So quite an investment from Meta And I guess trying to get people to interact with their AI kind of offerings across these apps.

I can say I still haven't used anything and I use WhatsApp and Instagram and I kind of just ignored the AI on that front. I'm sure many people are on the same boat. So perhaps this is meant to

Jon

address that. I have used it in WhatsApp. It's just, it's not very interesting because you just. For people who are the regular listeners to this podcast, you're probably quite aware that you're using a llama model and it's probably not the most expensive llama model. If it's this, you know, the free and WhatsApp. And so, you know, you can get some, you get the kinds of responses you'd expect. If there's nothing mind blowing, you're not getting state of the art capabilities.

Like when you use cloth 3. 5 sonnet. And so, I don't know, it isn't very compelling. I don't, I don't It doesn't seem obvious to me right now that I need to be having a generative AI conversation in WhatsApp instead of in the chat GPT app or the cloud app, which is equally close to me in my phone.

Andrey

Exactly. Yeah. I have the same impression is, you know, if I am working and usually I do use chatbots quite a bit in my workday and, and just throughout, uh, doing tasks, I just open a tab on my browser to go to either Claude or. JGPT and these kinds of things. Don't start in IC, but who knows, maybe we'll make friends with our AIs soon enough. Now that it has a John

Jon

Cena voice, I'm going to be in there. Yeah, that's, that's what I need to hear. One thing I guess to call out here, which might be obvious to listeners already, is this, um, these different strategy choices between open AI and meta here, where on the, on the one hand, there's something that you talk about a lot on the show where Open AI has gone down the closed AI route of, you know, keeping their models proprietary, whereas meta AI is the biggest proponent of open source models.

And then another strategic difference here is that open AI, for whatever reason, has decided to try to not be partnered with specific celebrity names and to just have those generic tree names, even though. They obviously did famously get in trouble with Sky being too close to Scarlett Johansson and being sued by her.

But, um, but yeah, it's a different, it's an interesting strategy for Meta to say, okay, you know, we're going to deliberately partner with these celebrities to get their voices in and opening eyes seemingly except for the Scarlett Johansson blip. Deliberately avoiding that. So I don't know. I don't know. I don't know if I have like a, an explanation or something bigger to say that, but I want to call out that distinction

Andrey

onto lightning round, which sometimes means we go through these quicker, sometimes not a first or it should be quicker. And once again, it is on this theme. It is that Gemini's voice mode is out now for free on Android. So we saw this. Gemini live voice chat mode, uh, around the same time as GPD 4. 0 was announced, very similar in that it's, you know, semi real time conversation with a language model, in this case, Gemini. And it is now out for You know, a huge swath of Android users.

It's only available in English, but they are saying that it will arrive later with new languages. It's also only on Android. It will arrive on IOS later. But if you are using me. Android and, uh, are speaking English. You can speak to it pretty easily. There's like a new way form icon in the bottom right corner of the app, uh, or overlay of Gemini. So another example of, you know, this. kind of modality is making pretty rapid progress.

It's only like a few months ago that we even got demonstrations that this is possible. Now it's kind of rolling out wide.

Jon

Yeah. And I'm a Gemini fan. I actually subscribed to all three of chat, GPT, Claude and Gemini because I, I, there's different use cases for me that ended up being valuable. And so for Gemini, it's the real time web search as well as very large files. Cause, uh, I think the biggest context window right now is 2 million tokens, which is a lot. So, you know, you can pass in a huge amount of audio or video, uh, which sometimes comes in handy for things related to the podcast in particular.

And yeah, I thought, so I'm a fan of Gemini in general. I, it's great to see that they're making this progress on the voice front as well, just like the other big players that we've mentioned in this episode already, Meta and OpenAI. And yeah, I think the big thing here with all, for all of these companies is going to be, like you said, near real time. It's once this becomes Transcribed Actually real time.

And you can see with Gemini here, um, apparently I haven't used the Gemini voice functionality, but apparently you can interrupt the AI mid sentence, which is going in that direction, but it's going to be where you feel just like you and I talking on this episode right now, Andre, where it seems like everything that I'm saying can be, uh, responded to kind of really reactively, really in real time. And something that's going to be interesting is.

Once, so for a long time, you and Jeremy recorded this podcast audio only, but you went on video to record it. And that's because there's lots of extra information that we provide with our faces and our gestures that allows us to have an even better conversation. And so it's going to be interesting.

And I. I bet that some of these companies, maybe all of them have already been prototyping ways of having your webcam or your VR goggles be able to react to your facial expressions, because then you could have things like and things like this is completely speculative. I have no idea. But I use an Opal C1 camera and OpenAI recently acquired Opal, which is a hardware company that makes cameras. Why would they be doing that?

And it seems to me like there's a great opportunity there to be, because imagine you're looking at the outputs in a chat GPT conversation and you furrow your brow as you're reading it. Then it might, all of a sudden it might stop and say, Oh. I'm sorry. I realized I just made a mistake without you having to type or do anything. Like there is no technological reason why that couldn't happen.

And so it's probably just a matter of kind of like testing and feeling like you, um, have worked out the kinks and maybe, you know, ethical issues that could be associated with it.

Andrey

Yeah, that's some interesting speculation. I haven't thought about that, but it would make a lot of sense that to actually make the conversation seamless, being able to see your face and see those signals of like, oh, let me jump in there, uh, would definitely be useful. Next, moving away from conversing with AI, we are moving on to AI for video, another big trend this year. And the story is that AI video rivalry has intensified as Luma announces Dream Machine API hours after Runway.

Kind of a long title, but pretty informative. So both of these companies, Luma and Runway, kind of the leaders in the space of AI video generation, I would say, are now offering APIs. And APIs are what you can use programmatically. So instead of going on the Uh, user interface on the website and clicking buttons, right? Writing text.

You can now have some code that sends a request and this is how you would make, you know, another app or, uh, perhaps, you know, integrate it with your company's product, whatever. So you now have this API, it's connected to the latest version of dream machine. They say that it's priced at 10. 0. 32 cents per million pixels generated, which is a fun way to think about it, I guess.

And, uh, as that's, that's for Luma just hours prior to that runway has also launched the API where you actually fill out a form to get access to it now, versus GeoMachines API is available to use for everyone, I believe. And similarly has Gen3 Alpha Turbo with different pricing plans. pretty equivalent. So another trend where, you know, it's not that long ago that we started to see usable video generation, so to speak. And now they are trying to actually commercialize it with

Jon

these moves. And you say that you started off by saying Luma and Runway are the leaders in text to video, which I guess is true for easily accessible. Models, because of course there is that juggernaut Sora out there that few people have access to

Andrey

moving on to Microsoft. They have some new stuff and it is called Co Pilot Wave 2. So they have this whole Co Pilot branding that is kind of everything that Microsoft has with AI here. It's focused on Microsoft 365. That's all there. Tools, Word, Excel, PowerPoint, et cetera. So this will have a chat interface called a business chat, which allows you to combine knowledge, uh, from your company with web based information, and you can create these collaborative documents called.

Pages, similar to Word documents, but they can be expanded and collaborated on by team members. Kind of a reminiscent of artifacts from Cloud and some of these things like perplexity, which also allows you to publish the results of your searches. There's also another new feature, switching between web and work modes, that enables cloud. Copilot to tap into knowledge contained in all work documents.

And Gemini has some similar functionality, I believe, in Drive, where you can talk to it with awareness of some documents that are relative to your context. Last note, you know, presumably there's quite a lot things here, but interesting one, PowerPoint has a new narrative builder that creates a whole deck of slides from a prompt using.

Co pilot, including transitions and speaker notes, which, uh, for those working in, uh, the industry, industrial world in the corporate world, it might be kind of exciting.

Jon

Yeah, I'm not really a Microsoft user, but they are certainly doing a lot in this space and they're investing a lot behind it. You, uh, yeah, you shouldn't be surprised if they emerge like they have for decades now. is the leader in enterprise applications of computing now with AI being at the forefront of computing.

Andrey

One last note, uh, these features are not quite out there. So the business chat and pages features are rolling out. These other features have been announced, but will be appearing in public previews later this month. And there are quite a few features here. So Microsoft still pretty aggressively expanding their AI suite. After the last update and here we have perplexity introducing a new reasoning focused search powered by opening eyes or one model.

So this is designed to solve puzzles, math problems and coding challenges. It's a feature currently in beta available to paying perplexity pro users. And currently very limited to just 10 queries per day. Probably because 01, uh, OpenAI's model limits your API access, uh, and it actually does not integrate with search. So this is essentially, it seems like kind of a connection to 01, just a way to use it indirectly.

Uh, but Perplexity has integrated various models pretty rapidly, and this is another example where. 01 just rolled out and Perplexity now has this in their tool.

Jon

Yeah, pretty curious decision. I, yeah, I'm not sure. It's not obvious from this that you get any extra value from Perplexity AI, but I guess if you're already a Perplexity AI subscriber. And you're not using a paid version of open AI, then this gives you the, you know, some access to one. So I guess that's the advantage.

Andrey

I wonder, have you tried a one because this one, this one I have played around with and actually used for coding pretty complex. And I was quite impressed. I will say compared to GPT 4, oh, it is leagues better at handling

Jon

complex. Oh, absolutely. So, uh, it was episode 820 of my podcast, which came out on September 20th. I do it's almost half an hour exclusively on Oh one. And in it, I do a lot of testing of the model as well. And so if you watch the YouTube version, you can see a screen share. of me trying different things out. And some of the most impressive things include that I was copying and pasting questions.

I, I teach a calculus for machine learning curriculum, and I was taking some of the relatively advanced, it's an introductory calculus course really, but I was taking some of the more advanced questions that I have in that curriculum, like partial derivatives, and I was copying and pasting them into And it absolutely knocked them out of, out of the park in a way that you would never even think that you could try that with GPT 4. 0.

Like if you tried that with GPT 4. 0, like you know that the next token prediction is just, it's, it's an estimate. It's not doing math. Um, it's not going to be checking for mistakes. And so the odds with even a moderately complicated partial derivative calculus question of you getting to the, Correct answer are very small with GPT 4a. But with O1, every time, every exercise that I put in, integrals two worked really well.

And the way that it, uh, expressed the answers as well was really impressive. If I had a student in class that was like, That expressed it the way that 01 did, I'd be like, wow, you should be teaching this class. Um, and so it also helps with, uh, with understanding problems. So, um, with these kinds of things, like the partial derivatives, the integrals, 01 would think for about 10, 12 seconds. But I tried a much more complex math problem that I candidly I didn't understand.

I didn't understand this question But it took about 90 seconds processing To the answer and then provided me with a response after that processing That allowed me to actually even understand what the question meant and to get to the right answer as well So super impressive also tried to There's lots of coding tasks that I expect models like gpt40 and clod 3.

5 sonnet to be really great at and then I So I it took me a while to come up with an idea for something that I wouldn't expect clod or gpt40 to get correct and eventually what I thought it was creating an interactive website that is a neural network where I can hover over nodes in the neural network and get the bias at that part of like at that node for that neuron in the network. And if I hover over an edge in the network, it shows me the weight of that connection in the neural network.

And so I was like, that sounds pretty complicated. And you know, it was a long prompt with lots of details. It didn't a hundred percent match Get it right like there were things like I asked for arrows going from left to right to show forward propagation through the network and It wasn't rendering arrows. It was just rendering straight lines.

But other than that, it nailed all of it the interactivity in HTML it was a huge amount of code and you know I mean, so it shows the potential for, yeah, this is, I mean, in a nutshell on a lot of it, like if you want to get like this, having as Jeremy described it, having this other, this additional way of scaling. So we already had, you know, scaling laws associated with making these LLMs larger and larger, scaling the amount of data. Now you can also scale inference time.

And so an obvious extension, which Noam Brown, one of the researchers working on a one at open AI, he wrote in a really great Twitter post about how this kind of Processing now can be, can be expanded. You can just add inference time instead of thinking on the seconds scale, you can think on minutes, hours, days, maybe weeks. And then you could put in really complex questions like solve cancer and come back six months later and, you know, see how it's doing.

I'm being a bit reductive, but there's a huge amount of potential here. And as the cost, like right now, the cost is obviously very expensive, but Like everything else in computing, the cost is going to continue to decrease at crazy rates. We're going to figure out software cleverness to make it much cheaper at inference time and hardware is always getting cheaper as well. So we already have for specific use cases like physics, chemistry, biology, 01 demonstrates PhD level capabilities.

And so even if we just were at today's level of capability with L1, you know, that it's going to get cheaper in the coming years. And so that's, then you theoretically have basically limitless number of PhD level, hard scientists all around the planet, just crunching through problems. That is a serious game changer.

And then on top of that, when you factor in the things like longer inference time, more cleverness around the way that they train these models, this isn't just going to be PhD level in hard sciences. This is going to be. It seems obvious to me more than PhD level, like serious, like, you know, lesser level or beyond in, you know, any imaginable quantitative subject.

And so that's, and as that becomes really cheap, then, you know, having that it's, it's effectively unlimited intelligence, far beyond human capabilities. Like we're already, We're looking at that now, these kinds of things that seem like science fiction just a year ago,

Andrey

right? Yeah. Oh, one to, to your point to this whole story of how you tried to have, I've tried it. It is a big deal, right? As we already covered. And it is a big deal, not just in a sense that it is very impressive, but also in terms of what it represents as a new paradigm of improvement beyond just scaling training to scaling training. Kind of how much you can think. Now, I will say there are some caveats there.

There are some challenges to just getting all want to be better, but definitely exciting. And from my personal experience and your experience, you know, believe the hype, so to speak. And moving on to applications and business, once again, as is often the case, we have some stories related to open AI that are not related to all one. But instead, some drama with their, uh, executive team and their corporate structure.

So when you story here, which just came out today, uh, dramatically titled open AI execs must quit as company removes control from nonprofit board and hands it to Sam Altman. I will say that's being a little bit overdramatic, but, uh, is also not entirely wrong. So the. Big part of this kind of the first wave, at least what I've seen was a company CTO Amira Moradi announcing her departure. And she has been there for six and a half years. So a pretty significant move.

Uh, she was for some background and involved in some of the drama with Sam Altman being removed from his position as CEO last year, although she was the interim CEO. Yes, she was the interim CEO. She expressed some concerns, it sounded like, although I don't believe she was directly pushing him out. Regardless, she is now announcing her resignation. She's saying that she wants to pursue other opportunities, explore options. So there's not, you know, any huge drama here.

But in addition to her leaving, the VP of Research and the Chief Research Officer Officer are also stepping away, which does seem a little significant to that. There's been these departures and this is falling on some notable researchers leaving just earlier this year. And all of this is falling on the heels.

We don't have concrete Uh, details yet, but there is more and more details or, you know, rumors coming out of this plan move away from being a nonprofit to being basically entirely for profit, more traditionally and reportedly also Sam Altman is going to get equity in opening, I kind of get basically some amount of control and some amount of, uh, monetary, uh, I guess benefit from being the CEO.

So, yeah, another chapter in the exciting and quite, uh, dramatic history of OpenAI's governance, uh, corporate structure,

Jon

whatever you want to call it. Yeah. A lot of people, a lot of big people have left obviously now, Ilyas Tutskiver being the biggest researcher and, you know, one of the co founders, Greg Brockman still there, but. Other than that, I can't like immediately think of names other than Sam Altman that are kind of the people who've been around for many years in leadership that are at least, you know, prominent publicly. Um, like, yeah, so a lot of these people have now left.

It does seem, it does seem interesting because if you think about how well the company's doing commercially and in terms of their brand, I mean, I don't know more. Do you think more than half the time on this podcast last week in AI, the opening story is an open AI story. Yeah, it's wild. Like it's there. There's so much at the forefront of what's happening in AI period that you'd think people would be. Clinging to that more than ever before.

You're like, wow, this thing that we've been working on, it's really working, but simultaneously people are leaving. So yeah, these things like going for profit, uh, you know, becoming closed AI. Um, Yeah, I guess that changes, I guess,

Andrey

it's kind of funny, like, uh, the term closed AI or the criticisms of open AI being closed off may go back to like 2019 is, you know, as a PhD student. Um, The first few years of OpenAI, it was like, you know, some weird R& D lab doing reinforcement learning research and putting out some open source packages for people to do reinforcement learning with. And they did contribute quite a lot. And one of their early developments of PPO for reinforcement learning was kind of a big deal.

So, you know, it, it was a very gradual shift in some sense, away from being a more open, just research. Company to now being a massive, you know, multi billion dollar, uh, company business that Seems to be shifting very much away from being this R& D lab with a focus on AGI having like, you know, benefit for all, uh, cap profit, all of these kind of weird things that are, uh, inspired or motivated by this idea that they are going to get, uh, AGI. Again, we don't know the full details of this.

Supposed move to for profit and so on. We don't know if the departures are directly related, but You know, as ever, interesting stuff seems to be

Jon

happening at OpenAI. The other interesting thing to highlight, you didn't mention it there, uh, in brief, but I think deserves a bit more attention, is you talked about how Sam Altman is getting equity now in, well, supposedly he's going to get equity in the new way that OpenAI is structured. And it is interesting, and I don't know all the details on this, I feel like it's the kind of thing maybe you know a lot more about, and I think Jeremy would have for sure, is, um, It's it.

My memory of it was that Sam didn't have equity. And that always was weird to me. The CEO didn't have equity. And now it would be 7 percent supposedly is what is being reported. And that's a lot. That's a lot. Cause often startups kind of reserve about 10 percent of their capital for all employees. Including the CEO. So 7 percent is a huge amount.

Andrey

Definitely. Right. That's a pretty significant amount to get once you are like, you know, obviously if you are a founder, you have a big chunk of the equity early on, but to get it from a position of not having any equity. is a big move. One last thing, I guess it is worth covering some of the statements. So, you know, we already did put out a statement that was very positive, right? Uh, no criticisms of anything. Sam Altman posted a statement to X and let me just read this.

Leadership changes are a natural part of companies, especially companies that grow so quickly and are so Demanding. I obviously won't pretend it's natural for this one to be so abrupt, but we are not a normal company. And I think for reasons Mira explained to me, there's never a good time. Anything not abrupt, we have leaked. And she wanted to do this while OpenAI was in an upswing. Make sense. So there you go.

Yeah. Like let's not, uh, you know, make it sound like OpenAI is definitely in some turmoil,

Jon

but it's just interesting. It seems like there might, it seems like there might be more to the story than gets tweeted, um, publicly and in the same way. Uh, very similar to the tweet that you just read from Sam Altman, there was the same kind of thing from Greg Ruckman. Uh, you know, I have deep appreciation for what each of Barrett, Bob, and Mira brought to OpenAI. We've worked together for many years. It's a very long post.

And in fact, if anything, I mean, I realize I'm just creating like, you know, I'm generating drama here. But, you know, even that, like to have all of these people that you kind of expect, you know, everyone's kind of watching, what is Greg Brockman going to say? What is Sam Altman going to say? And then they all have these like really nice things to say. It seems like it could be a bit, uh, manufactured, I guess. But yeah, again, completely speculation, maybe there is an injury on that at all.

Andrey

Next, another story on OpenAI and this is, you know, in a bit of a odd to Jeremy, let's do a safety story. So Sam Altman is departing OpenAI's safety committee. This is their internal safety and security committee, which oversees safety decisions related to a company's projects. They will now be an. independent board oversight group chaired by Carnegie Mellon professors Zico Coulter and including several other people.

And, uh, you know, the committee will still be responsible for things like safety reviews, including their review of things like O1 and will continue to receive regular briefings from OpenAI safety and security. teams. So for me, hard to read fully into what this implies. It sounds like maybe it's becoming more independent, which is something you'd want, right?

You don't want Sam Altman to be on a committee that is overseeing and evaluating how safe something is that you may want to release for commercial interests.

Jon

Yep. I wish I was Jeremy so that I could provide a lot more rich color on the safety aspects of this, but I'm not. So onto the next story.

Andrey

Out to the lightning round. First we have chip startup Grok backs Saudi AI ambitions with our Ramco deal. So we've been talking about Grok pretty frequently this past few months. Grok is a producer, a designer of advanced chips for running AI models for inference in particular. They have these language processing units and they do appear to be a leader here. in how fast they can run models such as Lama.

Now they have partnered with oil producer Aramco to build a giant data center in Saudi Arabia. This is going to have apparently 19, 000 language processing units initially, and Aramco will fund the development that is expected to cost in the order of nine figures, according to an interview with the CEO.

They say the data center will be up and running by the end of the year, and could later expand to include a total of 200, 000 language processing units, uh, which would make this a massive, massive AI inferencing center, right? We, we've heard, uh, Elon Musk, of course, saying that he'll have 100, 000 GPUs up and running at their, uh, super AI inference center. So another one being worked on here.

Jon

Yeah, an interesting thing for me here is that this is providing a huge amount of capacity to one of these countries that is quite neutral between, say, the West and, say, the other axis, which could be like China, Russia, Iran, you know, you have, so you have kind of these two different You know, these two, um, industrial systems that are to some extent becoming decoupled a bit, although it isn't like the kind of Soviet era decoupling that was experienced.

Um, but you know, there, there are efforts to have, you know, more and more tariffs and sanctions going both ways between those two groups. And Saudi Arabia is one of those countries that kind of sits in the middle. Um, it. Freely does trade with both the West and the other axis.

And so, yeah, it's an interesting, yeah, it's, it's an interesting play and it's interesting how, at least for now, there doesn't seem to be any resistance from say us politicians to be selling 200, 000 language processing units to Saudi Arabia, which obviously is a critical partner to the U S in the Middle East. Um, but yeah, they're also a critical partner to China. So it's, yeah, it's an interesting, uh, I don't know a lot about geopolitics, but yeah, But, um, yeah,

Andrey

I'm sure there's a lot of nuances here with the relations and implications. Either way, from a business perspective, clearly a good thing for Grok, and they've seemed to be on the rise and a leader in the technology. With this kind of investment, they can actually scale up and provide services to many more companies, really compete in the cloud

Jon

AI inference space. On the other side of things, the bad thing for Grok, G R O Q, this company, it's got to be so annoying for them that X has named their big model Grok, G R O K. And on to our next story.

Andrey

Which, uh, yeah, you have a nice segue there. Uh, so G. R. O. K. grok has, uh, been in the news partially lately because of the new iteration of it, but also because they have integrated image generation with flux, the image generator from. Black Forest Labs. And this story is about that company, Black Forest Labs. They are raising a hundred million at a one billion valuation. This company was co founded by engineers behind Stability AI.

So basically some real advanced people who were involved in some of the pivotal technology and progress in AI image generation. The company is pretty new, actually pretty fresh. Have previously raised 31 billion. And it sounds like they are now going to raise a bunch more and get to the 1 billion valuation AI startup club, which is, you know, not as prevalent, not as easy to do nowadays as it was maybe a year ago. So Black Forest Labs, clearly a lot of excitement about

Jon

what they're working on. My favorite thing about Black Forest Labs is how delicious it sounds. I can't hear Black Forest Labs without tasting the taste of Black Forest Cake, and I think that's unique amongst tech companies. No other tech company name makes me salivate.

Andrey

Yes, I do think they have some very cooler sounding names, and it's definitely more fun to say than some of the other ones out there. Next, another story related to a trend that we cover pretty frequently on the show. It is about a new humanoid robot. Although this one is apparently semi humanoid. So this is from Pudu Robotics.

They're introducing the D7, a semi humanoid robot with an 8 hour battery life, 10 kilogram lift power, Another one that is seemingly going to be somewhat affordable, that is practically usable, potentially, it was unveiled by them in May as part of the long term strategy for service Robotic sector. So potentially seeing this in stores, et cetera. Final notes. What does semi humanoid mean? And that means to be, to be clear, it means that this has no legs.

So basically it drives around on a wheeled base, but the top of it, it has a torso arms and a face. It's kind of, uh, it's a Roomba with a torso on top, which I guess we haven't seen many of our people do. So perhaps the semi humanoid

Jon

tact here will help makes a huge amount of sense though. I actually, I hadn't seen this kind of design before the semi humanoid design with the wheeled bottom torso top, but makes a huge amount of sense. Cause if you think about it, there's not. I mean, going up and down stairs, I guess this wouldn't be able to do, which is something a totally humanoid robot could do.

But that being able to go up and down stairs comes at the expense of a lot of extra compute and probably also power required just to stabilize. Whereas when you've got wheels and it's just supporting a torso, that's going to be great for battery life. And it's going to be useful for a lot of situations because now you can be doing manual manipulations on.

you know, countertops or, you know, at a bunch of different levels that humans can access things at with their own arms, um, without, yeah, all that extra expense and complexity and presumably battery usage associated with legs.

Andrey

Right. Exactly. In the video they released, which is about a minute and a half shows a lot of use cases. One of the ones they highlight is retail scenarios, goods, sorting and shelving, taking bottles and placing them. So you can see how this is maybe. Not requiring of legs. It's actually better to be able to drive around.

Jon

I got to say, you know, where black forest labs, I think it is a great company name. It makes me salivate. It's easy to know how you spell it and type it. I got to say Pudu. P U D U? Poo doo? I mean, come on, it just, it sounds like something that kindergarten kids came up with.

Andrey

I don't know, maybe, uh, it depends on some background. It's, it's based in Shenzhen, so perhaps we don't fully understand. Right, we just don't get it. They're like, it's delicious! You haven't tried poo doo? And on to the last story, it's about Amazon and they are introducing Amelia, an AI assistant for third party sellers. So this is described as an all in one generative AI based selling expert.

It will be available in beta to select U. S. sellers and it is You know, an ongoing demonstration of Amazon trying to include generative AI. They have introduced uh, AI powered shopping assistant named Rufus, a business chatbot Q. They have introduced like a Believe a thing that helps people who leverage AWS understand how to do this. This is yet another tool in that, uh, trajectory.

And apparently more than 400, 000 of Amazon's third party sellers have used its AI listing tool, which is, uh, in an increase, uh, a recent increase from just 200, 000 in June. So, you know, maybe third party sellers will get a lot

Jon

of use out of this. It's interesting to me that most of the other big tech companies, in fact, maybe all of them, they try to brand their generative AI capabilities under one name, whereas Amazon, I guess, is a reflection of its kind of corporate structure where they, you know, you, you have those two pizza meetings famously, you're trying to always have small teams be efficient. It's ended up creating this kind of collage of lots of different kinds of models.

Rufus, the shopping assistant, business chatbot queue. Now, Amelia, it seems like that kind of, while I totally get how under the covers these are, this is different infrastructure, different model weights makes a lot of sense from a data science development perspective. But from a marketing perspective, I think it's pretty confusing. And I don't know, like I've seen Rufus show up in my Amazon shopping experience recently, and it's not something that. Like just like having meta AI in WhatsApp.

I'm like, ah, I'm good.

Andrey

Onto projects and open source. And even though we're almost an hour in, this could perhaps be the top story of the week. Certainly if you're someone who follows AI news closely, Meta has released a Llama Free Point Two, and it gets a major update. It is capable of processing both images and text. So this has been one of the notations of Llama 3 Llama 3. 1. Even as they've gone really big, they haven't been able to take in images, which, of course, GPT 4. 0 and other things can.

Now you can give it images, and they say that you can use it for various AI applications. Like understanding video, visual search, things like that. There are, uh, two vision models. So there's an 11 billion parameter vision model and a 90 billion parameter. And along with that, Llama 3. 2 comes with two lightweight text only models with 1 billion parameters and 3 billion parameters. Somewhat similar to what we've seen from Microsoft with PHY, what we've seen from Google with Jemma.

These are condensed models that appear to be working really well. They seem to be condensed from the very big Lama models into these relatively small language models that are still capable of a lot. This came along with a bunch of announcements actually that they've demoed. So they also announced their Ray Ban Metaglasses get more feature. They demoed live translation from Spanish to English. They have a. prototype augmented reality thing, which of course will integrate AI, presumably.

Uh, but certainly having an open source model that is very good and now capable of dealing with images is a bit of a game changer in the sense that Llama 3. 0 was already very important to the AI ecosystem as a large language model that is at least sort of competitive with the frontier models. And now we can do images, You know, it makes it even more of a useful tool for people who want to build something about relying on opening. I

Jon

totally. Yeah, I mean, kudos to meta for continuing to support these kinds of open source releases. I'm hugely grateful myself as someone running the data science function. Uh, at an AI startup, we are able to leverage. These models, these are our preference. Llama architectures, we get a huge range of possible sizes, lots of, um, different fine tunings, you know, for code generation or chat applications is a huge service to everyone who is developing and deploying AI models.

And so I'm deeply grateful. You know, we, we are able to have our own proprietary models running on our own infrastructure. At least in terms of training, an almost trivial cost, it could cost hundreds of dollars, thousands of dollars to create really well fine tuned models for the tasks that we need them to. If we had to train, if we had to do the pre training, it costs hundreds of millions of dollars that MED is investing here.

And so that also then ties into, You know, we talked about all these earlier, earlier in the episode, we were talking about all these new voices being added in and meta trying to push us as consumers of these free products into using these AI tools.

And part of why that push is happening is because executives and meta Mark Zuckerberg would like to be able to show to investors that they're getting some kind of return that there's, you know, an increase in engagement in Instagram, say, as a result of these billions of dollars being spent on open source research.

Andrey

And the next story, quite related, Alibaba has unveiled Ovis 1. 6, a new multimodal language model. So Alibaba is a leader in AI hailing from China. They are I don't know, something like Amazon, you could say, uh, from the east, and they have revealed this multi modal large language model, meaning basically it's, uh, same as Llama 3. 2 in the sense that it can take in both images and text.

So OVIS stands for Open Vision, and they introduced some kind of new techniques for having the architecture for multimodal large language models. They have a full paper out on this stuff, and this is an update on their codepot. So we're adding this new version 1. 6 released under an Apache license. Uh, and they have the model out. We have a demo out, and it, of course, is better. It is trained on a larger, more diverse, and higher quality data set and also has instruction tuning.

So it seems to be performing pretty well on various benchmarks, uh, beating out other multi modal large language models, uh, out there on Just about every kind of benchmark, including Quen 2VL, which we covered last week, probably not beating Lamar 3. 2, but who knows, might be similar. So, exciting week for multimodal large models, for sure.

Jon

Yeah, the biggest thing for me from this release is this term, which I had not come across before, MLLM. multimodal large language model, which I really like because it's been a bit confusing to call. Originally, we had LLM describing just text in text out.

LLMs, which is really straightforward because we've got language going in and language coming out, even if that's a programming language, okay, still language, but then as these became multimodal, sometimes then they get described as a foundation model. which isn't as widely used a term and isn't disambiguous. And so I like this MLLM.

So you're, and I also like part of what I like about that is that even if it's multimodal, even if it's has vision capabilities, say as in the case of OVIS here, It's still relying on language, um, at, at, at a, um, at an abstract level in order to be able to do the vision capabilities that it has. So the, the language capabilities of the LLM enhance the world model that the vision model has to work with, that the vision capabilities have to work with.

Um, and, and, and a complimentary in that sense.

Andrey

Right, exactly. Like the model itself about getting too technical is kind of. Smooshing together images and text, you can almost say like both get tokenized, converted into these like set of symbols, then get converted into a big vector, a bunch of numbers, and then ultimately both go into a big neural net. Sort of just together. So there's like a soup of representation of images and texts, uh, and a lot of sort of cross referencing between images and texts.

So, uh, there's potentially, uh, you know, you can scale up, train on more images and on more texts and, you know, you might get better overall results. Moving on to research advancements, we have a couple papers. The first one is to COT or not to COT? And the result of that question is chain of thought helps mainly on math and symbolic reasoning. So COT is a chain of thoughts. We've mentioned it quite a few times.

So to recap quickly, it is just telling your model, you know, think, a bit and, uh, first enumerate your, uh, chain of thinking, basically think through the problem and then give me the answer instead of just jumping to the answer. And there's been a lot of research on this, a lot of kind of known, uh, ways of using it. Of course, GPT, uh, not GPT 01 from OpenAI has chain of thought baked in and trained to do.

chain of thought or you could say reasoning that is certainly related to chain of thought. In any case, this paper is looking at whether chain of thought prompting is actually useful and makes VL1 better and what they have found from quite a large analysis of over 100 papers is that It does help on math and symbolic reasoning, but on other tasks like common sense, reasoning, text classification, context aware QA, it is, let's say, less of a difference.

It doesn't give you quite as much of a jump in performance. So this is, let's say, maybe more of an empirical paper, right? They're just demonstrating. a result of a lot of different evaluations of using this. Perhaps not fully surprising that chain of thought and kind of enumerating your thinking is mainly useful for math and symbolic reasoning, logical reasoning, these kinds of things that require you to talk through kind of a problem.

But another nice finding also for trying to get a better sense of how alarms work. I

Jon

would say here, I have a reductive oversimplification that has served me well in terms of understanding where chain of thought or where an O1 style model would outperform a GPT 4. 0 style of model, and that is, a with, um, Daniel Kahneman's thinking fast and slow kind of, uh, paradigm. So, uh, Daniel Kahneman recently, uh, he just earlier this year passed away.

Nobel prize winning economist was famous for decades for creating lots of research, particularly with someone named Amos Storky about how human brains work. And one of the big conclusions. that they had, which is right in the title of the book, thinking fast and slow from Daniel Kahneman is that we have these two, uh, systems of thinking. And so you have the fast system, one thinking fast and slow. So the first one is the fast one system one. And that's like your intuition.

That's what I'm doing right now. Words are just coming out of my mouth. They're just spilling out. I'm not planning what I'm saying. It's just happening. And that's like GPT 4. 0. And so on. tasks where you're generating an email or you're editing the copy of a document that you wrote. That kind of problem can be handled very well by that kind of system one, just spewing words out without thinking ahead, um, that GPT 4. 0 can do.

And so in the O1, uh, research announcement a couple of weeks ago, there They did comparisons across different subject areas and on things like I just said, like writing an email or editing text, GPT 4. 0 might even do better than a one, or they're at least comparable in human evaluations.

They do about, you know, they perform about 50, 50 on human evaluations, whereas it's tasks that leverage your slow thinking the system to in that thinking fast and slow, where, you know, when I, uh, Uh, when I'm tackling a math problem that is, you know, not something that I've already become adept at, I need to break that down into parts. I need to spend time just staring with pencil and paper at the problem. And it's those kinds of problems where you have to stop and think.

That chain of thought, Types of systems like oh one significantly outperform. And so that's kind of my, you know, when I'm thinking about a task that I might need an LLM for, I'm like, you know, is this a system one, uh, kind of task, in which case I'll probably go to cloud 3. 5 sonnet today. Whereas if it's a system two kind of task, like writing some very sophisticated code, doing some, uh, doing math for sure.

Um, if I had physics problems regularly, like maybe Jeremy would, uh, he's probably just always sitting around at home doing physics problems, right? Um, then, you know, in those kinds of scenarios, I would go to O1 right away. Exactly.

Andrey

Right. And, uh, so they do show that on those types of problems, math and symbolic reasoning, to get into the numbers, you are seeing improvements of, let's say up to 50%, uh, 20%, uh, uh, pretty high possible improvements versus if you're looking at common sense reasoning, you might still see some improvement, uh, depending on your context, but it might be just like a few percent, for instance. So you really don't require it.

And it makes sense if you are able to do kind of a quick common sense response, or if it's just needing to know some knowledge. There's no need to plan ahead, as you say. And next paper, quite relevant on a related topic, and the title is LLMs Still Can't Plan? Can LRMs, a preliminary evaluation of OpenAI's O1 on plan net. bench. This paper is using this, uh, notion I haven't heard before of a term, large reasoning model, LRM.

Not sure if it has been used before, but, uh, they do say that O1 or strawberry as what is codename. According to them, it's claimed to be a large reasoning model designed to overcome limitations of LLMs. They show in this paper that it does represent significant improvements on sort of classical planning tasks, but still isn't, you know, significantly, capable of doing long plans of, let's say, going on to 10, 12, 14 steps. So how do we do this? What is planning in this context?

What is plan bench? The idea here is that there's in AI, a sort of, whole category of problems known as planning problems, where essentially you have a goal state you want to get to represented in some sort of set of variables. So one example that they are working with in this paper is this block world problem where you have a set of blocks, they are stacked on each other or kind of in various physical configurations. You have a set of actions like picking up a block.

On stacking a block from on top of another block, putting down a block and stacking a block on top of another block. And so you may have a set of configurations for the blocks that you want to get to from an initial set where you can put them down. You need to do this set of actions, picking up, stacking, unstacking, one by one to get there.

And there's a whole set of algorithms going on to the very beginning of AI and robotics, like PDL, Shaky from Stanford, that solve this kind of thing, that find you the sequence of actions that get you from a state to another state. And you can do this, uh, you know, if you have exact actions, you don't need. machine learning, even you can just do planning.

So here they evaluate 01 to see if it can give an, a description of a state and some actions it can take, whether it can generate a valid plan. And it certainly does perform much better than GPT 4. 0 and just about any other language model, but.

As you increase the path length, uh, going to again, 12, 14, 15 steps, it still goes to 0 percent correctness on this context of this kind of block world type of problem where it's just a few types of actions and a very specific state that you want to get to unsurprising. In some sense, this is comparing you.

Soft reasoning, so to speak, neural net reasoning to something that is much more algorithmic planning in this sense is essentially doing search through a sequence of things that is kind of a playground of straight up algorithms, straight up like search code routines, and not so much neural

Jon

networks. So, uh, yeah, like you said, I like this new term, LRM, language reasoning model. And this falls very neatly into the exact same thinking fast and slow buckets that I was just talking about where LLMs, the way that they're talking about them here, that's fast system one thinking where you're just spitting words out. And the LRM is taking time to reflect and plan before generating an output.

Andrey

Onto Black Mound, a couple more stories, starting with an advancement rather than a paper, something a bit more fun, and the story is that Norwegian startup One X has unveiled an AI world model for robot training. So 1X is one of the leaders in the space of humanoid robots. They have their robot, Eve, and they say that they have now an AI based world model. And this world model is essentially meant to enable you to do various types of tasks in dynamic environments.

Uh, so it can simulate the world. And that's what a world model is. It's essentially the ability to predict what will happen if you take a certain physical action. And they trained in simulation and in the various techniques to get a state of the art world model quite a bit better. In actual context with, uh, robot handling of, let's say, clothes or individual objects, opening doors, things like that, that enables it to do pretty reliably in novel environments.

So exciting if you're a robotics person, if you are excited about the humanoid progress we've been seeing one more note, they have also launched the one x world model challenge, where they are incentivizing more progress by offering over 100 hours of video data pre trained model and having a cash prize for people able to even provide improvements on top of that.

Jon

Yeah, very cool. Some people think that the realization of AGI will require robots to be able to kind of have an embodiment and explore the world, you know, as opposed to just being able to be, you know, and, uh, digital media in digital media out, um, kind of model. And interestingly, this kind of paradigm where you have a world model simulator that could potentially accelerate that ability of.

Um, of an AI system to be able to explore, you know, cause it's very expensive to be actually out in the world exploring. Um, and so, yeah, if you can do that virtually somehow, like this appears to do, then yeah, could really speed robot training up.

Andrey

And the last story, AI tool cuts unexpected deaths in hospital by 26%. according to a Canadian study. So this is about an early warning system called chart watch, which has led to a 26 percent in unexpected deaths among hospitalized patients at St. Michael's hospital in Toronto. This system monitors changes in a patient's medical record and makes hourly predictions about whether the patient is likely to recover. to deteriorate.

And so this includes about a hundred inputs from the patient's medical record, including vital signs and a lab test results. It can alert doctors and nurses to patients who are getting sicker or require intensive care. or are on the brink of death, uh, requiring intervention.

So according to the study, they looked at more than 13, 000 admissions to the internal medicine ward and compared that to admissions to other subspecialty units and seems that it definitely helped in a context of the internal medicine

Jon

ward. Yeah, this is a big deal, and I love that it's coming out of, I grew up in downtown Toronto, and I went to St. Michael's Choir School which is adjacent, it's next door to St. Michael's Hospital, and it's, I don't know, that's like a weird personal connection to this that just makes me think, wow, that's cool, and uh, Especially because this 26 percent drop is huge. I mean, a quarter of unexpected deaths.

And if you think about this as an early iteration on this system and, you know, stretching forward, when you see these kinds of systems work well, like we're seeing here from chart watch and you say, okay, we have a hundred inputs at this kind of level of granularity. We could be recording so much more data in hospital systems and training much, you know, more and. More and more better, more and more better AI systems. And so this is like just the beginning.

So to already see this 26 percent drop in unexpected deaths with what is probably a prototype working on a relatively limited data is really exciting. And I think it kind of ties into a world that I think we will have in our lifetimes where a huge amount of data is being collected and monitored.

not just in emergency rooms, but in, in your bedroom in your home and, you know, allowing us to have a really great sense of our health and get these early warning systems on, you know, people being on the brink of death and interventions being able to happen.

Andrey

Exactly. Pretty exciting. Worth noting, just as a caveat, this of course is only one hospital, so we do need more research on this. The data collected was also During COVID to some extent, it was November 2020 through June 2022. So a little bit of a different context, but that's what you get. Like this is a year and a half long study collecting real data at a real hospital. So it's certainly a very positive signal on that front.

And now to policy and safety, we are going back to a topic we've been talking about more and more, which is how will you actually power all of these data centers that require a whole lot of energy, much more than the grid has been used to supplying to data centers. And this next story is providing one. Answer. Apparently, the Three Mile Island nuclear plant will be open to power Microsoft data centers. Three Mile Island is a bit notorious.

It was the site of the worst commercial nuclear accident in U. S. history. And it seems that, uh, there was Power purchase agreement signed with whoever controls this to enable this, uh, nuclear plants to provide energy. They have an agreement for 20 years and the plant is expected to reopen in 2028 and renamed to Crane Clean Energy Center. This is reopening, to be clear, From it having been shut down in 2019 due to inability to compete with cheaper energy sources.

So this is not sort of like reopening a fully shuttered plant from decades ago. This is more of Microsoft investing to get another source of energy, uh, in addition to You know, I

Jon

guess non nuclear energy. Yeah, hopefully that thinking for months that O1 and more models like O1 will soon be doing will help us realize the nuclear fusion energy in a short order. But in the meantime, nuclear fission is out there as one of the best energy sources we have. It's interesting.

There's some countries like Germany, That are really opposed to having nuclear energy, but it's a great part of an energy mix alongside solar and wind because it can provide lots of, you know, solar and wind. You can't always guarantee you're going to have sunshine or wind. Um, you know, you can have batteries, uh, and those are getting better and better as well.

But having nuclear fusion as your backup, as opposed to say, oil or gas, uh, generators is obviously way better for the environment, at least in terms of carbon dioxide. Yes, with nuclear fusion, you have nuclear byproducts, but we're pretty good at managing those, and we're pretty good at managing risks with nuclear power plants as well. Um, you know, newer generations of power plants don't have any, uh, any record of issues like we did with Three Mile Island.

Andrey

Exactly. Next, a story related to policy. Governor Newsom signs bills to combat deepfake election content. So we've been covering SB1047 a whole lot. That's a big regulation bill, but it turns out there are other bills related to AI that are happening and there's been a slew of them that have been signed into law in California recently. So these ones in particular related to deepfake election content.

include AB 2655, which would require large online platforms to remove or label deceptive and digitally altered election related content and provide mechanisms to report this content. AB 2839 expands the time frame in which entities are prohibited from distributing deceptive AI generated election material. So that's presumably meaning you know, in the context of an election, when, uh, are you meant not to, uh, do this kind of thing.

It also expands the scope of existing laws to prohibit deceptive content. Last up, AB 2355 mandates that electoral advertisements using AI generated or substantially altered content future, uh, disclosure. So there you go. A whole, you know, a little set of bills related to AI altered content for elections. We've covered, you know, not too many stories related to deepfakes and elections, but there have been some. So perhaps looking ahead at emerging trends.

Jon

Yeah, I mean, did you know that Kamala Harris has never had anyone in any audience on any talk she's ever given?

Andrey

Yeah. It's

Jon

the AI. It's the deepfakes. 100 percent AI.

Andrey

Exactly. And next, a couple more bills, actually, not related to elections and deepfakes. Uh, these ones are about the digital likenesses of performers. So, there's a bill AB2602. That would require contracts to specify the use of AI generated digital replicas of a performer's voice of likeness, uh, informed by the historic strike by SAG AFTRA and some of the negotiations, uh, related to that. So I guess California does have Hollywood.

It makes sense that there is actual, uh, legal precedents being established here for this kind of thing. I'm curious, uh, John, you are a podcaster, you have a lot of data, were you thinking of doing a digital replica at

Jon

all? We are actually in the midst of exploring having the SuperData Science Podcast be broadcast in other languages. So we're looking at having Brazilian Portuguese, Spanish, and Arabic. Um, additional, additional versions of the podcast to get us out there to the billions of people in aggregate that speak those languages. And so you could, you know, vastly increase our audience and the tools are starting to get pretty darn good at it.

So yeah, it's something, it's one of those things that kind of feels like. Like I always kind of feel overwhelmed by things that are going on and I'm like, Oh man, this seems like a lot to do here. You know, you're, you're then talking about, you know, whole new YouTube channels and podcast RSS feeds. But, uh, yeah, a lot of potential listeners out there that you could be making an impact on. So yeah, definitely something. To explore here.

Andrey

Yeah, yeah, for sure. I, I think I mentioned this once before in a podcast, I have played around with 11 labs, the text to audio generator and fed it in like three hours of data from these kinds of recordings and got a pretty good replica. So if you wanted an AI version of yourself, you could definitely give it a try. And one last story. Startup behind world's first robot lawyer to pay 193, 000 for false ads, according to the FTC.

So this is A little bit of a funny story, but I guess also a serious story. Uh, apparently the Federal Trade Commission has taken action against the startup DoNotPay, which was advertised as the world's first robot lawyer. And the FTC found that this company DoNotPay conducted no testing to verify if its AI chatbots output was equivalent to a human lawyer's input. level and did not hire any attorneys to validate its legal claims. So do not pay is paying this fine of 195, 000.

It doesn't seem huge. And, uh, they are, uh, there's a 30 day public comment period, presumably. And they have also agreed to inform consumers who subscribe to a service between Uh, the last couple of years about the limitations of these features, uh, they are also prohibited for making baseless claims, uh, related to the services being able to substitute for professional lawyers.

Jon

You gotta think that they're getting way more than 200, 000 worth of media exposure here. This is one of those no, no news is bad news kind of situations. No,

Andrey

I think, I think

Jon

any

Andrey

news is, yeah, any bad press is good press kind of thing. Yeah. Uh, I think the bigger thing aside from a fine is prohibition of making baseless claims related to like AI being able to replace a lawyer. And it seems that this is part of a larger initiative, uh, called Operation AI Comply, aimed at cracking down on deceptive AI claims. So that's kind of a major problem, is people making claims about

Jon

AI tools that are not true. Really weird, uh, company name, Do Not Pay. I mean, I guess it's kind of, so they build themselves as an AI consumer champion to help you. They use AI to help you fight big corporations, protect your privacy, find hidden money, and beat bureaucracy. So yeah, I guess it kind of came from appealing parking tickets to start. And I guess that's kind of where the name came from. So like, do not pay the fine, but it looks like they will be paying.

There's 193, 000. Nice, nice.

Andrey

Onto the last section, synthetic media and art, and we got actually a couple pretty major stories, I would say, in this one. Starting with Snap is introducing an AI video generation tool for creators. So Snapchat, if you don't know, has a sort of feature similar to Instagram or TikTok, where you do have creators posting, uh, AI. Posting videos and various skits and whatnot. Apparently this tool will allow creators to generate AI videos from text prompts and in the future also image prompts.

The tool is in beta and available to a small subset of creators, but presumably will be expanded later on. And apparently this tool is powered by Snap's own foundational video model. So, uh, yeah, they are integrating direct AI video generation for anyone to use. And as a result, they are going to use icons and context cards to inform users when AI is used to generate content and include a watermark for any AI generated video.

And moving right along, there's basically an equivalent story where YouTube Shorts, which is similar again to TikTok and Instagram, will be integrating Veo, Google's AI video model. And this was just an announcement. So creators there will be able to integrate and use AI video generation and some of the other features of Veo to edit, to remix, uh, and, uh, generate six second long extend alone video clips. So video generation coming to all sorts of platforms,

Jon

basically. Yeah, it's exploding. It's a, it's one of those things where a year ago on this show, you guys were talking about how that's the next frontier. And a year later we are, yeah, we're figuring it out. Still, still plenty of room for improvement, but it's, it's, it's getting better all the time. Yeah.

Andrey

And one last story, Lionsgate signs a deal with AI company Runway and hopes that AI can eliminate storyboard artists and VFX crew, at least according to this story. So we got started with Runway early on and we are closing on Runway. There is now this deal where AI research company Runway will provide them with an AI model based On the film and TV content of Lionsgate and Lionsgate is a production company. They have movies such as John Wick, I believe, and quite a few other ones.

And they are going to try and see if they can replace VFX artists to create. backgrounds and special effects, for instance. Uh, you know, very preliminary. They're gonna see if this can be used in pre production or post production, uh, but, uh, seems to be a bit ahead of others in the sense of trying to train, uh, actually custom models on their own data. Alrighty, well, that's it.

We are done with another episode that hopefully will come out just a day or two after we record this one without any more delays. Thank you for listening as always. Uh, thank you for commenting, reviewing, and maybe subscribing on the last week in ai substack. As always, we appreciate your views, your comments, and so on. And we appreciate you, John, for filling in for Jeremy and being a great guest co host.

Jon

Seriously, it's my delight. Last Week in AI podcast is the only podcast that I always listen to, and So it's an honor always to be on here, uh, I remember the first time it happened, I was like, holy crap, I'm going to be able to hear myself on my favorite show. And I'm glad that you have kept bringing me back. I absolutely love it. It's so much fun. And yeah. Kudos to you. It's a huge amount of work that you put in, uh, every week, Andre.

I'm sure everyone who's listening appreciates it, but it deserves highlighting. I mean, from curating the amount just of stories that you've got to pass over to then decide and curate on the list that you cover, organize it, make sure there's no duplication. And that's just before you start recording. setting up recording slots, doing the recording, doing all the post production. It's wild how much you do every week on the show, Andre, so I appreciate it.

And I'm sure all your listeners out there do.

Andrey

Wow. That's some nice praise. And, uh, you know, I guess it is a bit of work, but it's always fun and I'm glad that people do enjoy it. So please do keep tuning in if you do enjoy it and please enjoy this AI outro song that presumably will have some soft piano or maybe metal. I'll try and follow up on the recommendation from our YouTube commenter to try one of those things.

Transcript source: Provided by creator in RSS feed: download file
#184 - OpenAI's Voice 2.0 + execs quitting, Llama 3.2, To CoT or not to CoT? | Last Week in AI podcast - Listen or read transcript on Metacast