Welcome to the Future episode on Find. Let's Dive Deep. Where the smart things right is the buzz and the Taylor never Seems not just movie Jen. Change game. Taking leaps last weekend, ai. We're breaking it down.
Hello and welcome to Blast Week in AI podcast. We in ears chat about what's going on with ai. As usual in this episode, we will summarize and discuss some of last week's most interesting AI news. And as always, I will mention you could go to lastweekin. ai for the stuff we didn't cover in this episode, and also for the links that come with this episode to all the stories. I am one of your hosts, Andrey Korenkov. My background is having studied AI at Stanford and now working with generative AI.
And once again, Jeremy is still on paternal leave, as regular listeners know. So we have another guest co host, another returning guest co host actually, and I will let you introduce
yourself. Hey everybody, this is Gavin Purcell. I am one half of AI for Humans. Our podcast and YouTube show specializes in kind of like Trying to both demystify AI for a kind of a more, uh, mainstream entertainment, uh, based audience, but we also do a lot of fun, creative experiments along the way. My partner, Kevin prayer, and I have been doing it for about a year and a half now. So. I would say, Andre, that we're almost experts. We're getting there. We're getting there.
Yeah, I would say so. Definitely experts like in the realm of trends in AI, especially, you know, outside of research and technical stuff. Because, you know,
in creative stuff, I think that's one thing I've talked a lot about where we're kind of like sweet spot is is like if you're interested in creative stuff, we both come from creative backgrounds. I worked in TV for a long time. Kevin also did a lot of TV. So. We do a lot of weird experiments with the creative tools and try to show people how to use them in fun ways.
Yeah, exactly. Your background is creative, also entertainment. And I will say probably compared to this podcast, it's a little bit more entertaining for sure. I'm sure I
listen, I listen to this podcast to get like the technical details down and sometimes I read, re synthesize it in ours. So they're both have their use cases. Yeah,
but I also am a fan of the podcast. I love the AI co hosts and the little experiments. So for any listeners, if that sounds appealing, do check it out.
If you want to, if you want to learn more about our show, uh, go check out AI for humans dot show. That's our website. You can get all the links there to YouTube and the podcast.
Yep. Or just, you know, you can go to YouTube and presumably search there and find a bunch of cool clips as well. Yes. And real quick before we get into the news, do you want to respond to some comments? We had a cool correction on YouTube, which is always nice. Hiding secret messages in plain sight is steganography, not stenography, which is according to a cryptographer, which I did not know. So that's interesting.
And another one that commented, On us being back on schedule and it no longer being a time warp to listen for that. Hopefully we'll stay on schedule. We'll see if we manage. And a couple of nice, uh, reviews also on Apple podcast from extant pensis and nerd planet. I always find these. You know, uh, names on Apple podcasts, pretty entertaining. So thank you for all reviews and comments. Always appreciate it. Feel free to go to YouTube, uh, to our sub stack, et cetera.
And with that out of the way, let's get into the news. And this week actually is a great week for you Gavin to co host because we have quite a few news related to creative stuff, to tools people can use and not so much the more technical, open source, whatever that sometimes is the focus. And you begin in tools and apps with I think the biggest story of the week, at least for me, which is a meta announcing MovieGen, an AI powered video generator.
So this is pretty much Sora, I would say, from a meta. And it's actually not just a video generator. The paper they put out is called MovieGen XR. Cast of Media Foundation Models, kind of a cute title. And so, in addition to generating video, it can also do video editing. It can, you know, modify video in certain ways, like swapping out objects, etc. They also have a different model for generating audio for a video, for a clip. And they have, uh, just a lot of examples of ways you can use this.
And the details here, we can get a little bit into it. So the model can generate 16 seconds of video at 16 frames per second, they compare it to all the kind of current models, runway, gen three, Sora, cling 1. 5 Luma labs, and it just beats them out of the water. I'll try to edit in some clips as we were talking here and. It looks really good. So I'm curious to hear your thoughts on it. Gavin
looks really good. Uh, and I think it's really awesome. But just like Sora, it is not available to try. Uh, and I think this is a really important aspect of both this and Sora because I've spent a lot of time with these with these tools, Runway Gen 3 in specific, but also with Luma and Kling and Minimax, the Kling and Minimax, both the Chinese models. And Yeah, it looks amazing.
I mean, the one thing I'll say that's really different about this versus what Sora has shown and really what any of the video models have shown to date is the in painting feature and the in painting feature that they're that they're showing off here is really transformative for use cases. If you're using AI video was one of the coolest clips is they have a show a guy who's kind of running in this kind of desert landscape. Babe. And they can change him or they can change the landscape around him.
And that's something that's been really hard with AI video today, because it's very much like that slot machine mechanic where you're like, OK, let's see what we get here. Even with image to video, you're not really sure what you're going to get. If you're going to start being able to control this a little better, it's great.
I think the examples that they've shown of this so far, there's an amazing, the top of their blog post, there's an amazing, like, uh, almost like mudang baby hippo, that's kind of swimming in the water, which you can show here, which is really cool. The thing that's interesting to me about this, and I'm curious to think about what your side of it is, is like, you know, Runway themselves has talked about this. Soras talked about this. I think MetaGen is getting at this as well.
Sorry, MovieGen is getting at this as well too. Is this idea of AI video as a world simulator and not just as a, as a video simulator, right?
And I think there's a lot to be said for, and this may be getting in the technical side of the weeds a little bit, but the idea that instead of LLMs training on words, that soon we're going to be training, we're already training on images and video, but like training on the real world and what the real world looks like and how physics work will maybe create a much bigger, more interesting model at large.
And so, These kind of like high end movie generation models feel like they're a step towards that, which, you know, as a video gamer, the kind of holy grail is you go into an environment and you say, I want to play X, Y, or Z, and I want it to look like this, and that's a really cool, uh, use case. Now, it may take half the resources on the planet to generate that video game, I don't know that. I think this is a really interesting step in the next direction.
I think the other thing that Kevin and I talk a lot about in the show, about in the entertainment industry, but really kind of at large, there's a lot of people who are, we call the AI nevers, right? The people that never want to use AI. And, and I think there's a message that we always try to send to both those people, but also to the people that might be a little more curious about it, which is, you know, this is not going to slow down.
So in a lot of ways, you're better off being aware and, and. Open to these tools as tools, because it's only going to get crazier from here is kind of our takeaway. That makes
sense. And yeah, to your point, the kind of general idea of training on video is something that seems to be in the future of these multimodal models, right? We have, uh, GPD 4. 0 from OpenAI, which was trained on audio and images and so on.
And it seems like Probably video is one of these things that they could tap into more, and it's not super clear yet if you sort of get compounding effects, if training and video improves also your reasoning in text, but it also isn't, you know, an unreasonable theory and there are some indications that could happen. So yeah, video is very much still the future and present of the frontier of AI. And. Yeah, I think also, as you said, it's worth pointing out, this is not real time, first of all.
It does take a long time to generate videos, as was the case with Sora. So the comparison to Runway Gen 3 and Luma and these actual tools isn't entirely fair. Considering they are meant to actually be used and in fact, there's no release partially because they have stated that it's just sort of too early. It's too slow, etc. This is more of a preview.
I'm really curious to see, uh, where Sora is at, right? Because Sora is something we've now known existed for a while. And as far as I can tell from research in the background, really has been around inside OpenAI for probably a year at this point at most. I'm surprised it's not out in some form yet because it is kind of shocking to me. Now, they may be like, we're already burning our servers out, running 01.
So like we can't release a video model at this point, but I expect Sora based on, you know, Sam's new ship mentality where he's shipping stuff quite a bit. I kind of expect Sora to come out post election this year before the end of the year, because I don't think open AI wants to see that seem like they're farther behind.
And I also think I know you were probably following the story too, but like, Sora also they've from what we've been seeing when we've seen there's a there's an image model too there, right? So that might be an easy way to update Dolly to have Dolly 4 or you just release it as Sora. I do think that's coming soon. This feels like it's a little bit further off, but it just shows you again. Meta is throwing so much money at this stuff and doing so much things that they are going to be a big player.
And, you know, again, it's like Llama 4 is not that far away. And you have to just imagine the, the size of an open source model that that's going to be that good is going to be kind of a game changer. Exactly. And
to a point of the open source, uh, aspect of this, there have been questions of like, well, are we going to get the weights of this as we have with these LLAMA models? No weights so far and no real promise of an open release, which I wouldn't I don't find surprising like this is actually something that, uh, let's say Google or other competitors don't have as opposed to large language models.
So for a competitive advantage, this would be certainly something you might want to keep to yourself and not share with everyone.
One last thing on this. I think what's interesting is this fits very much into the sort of stuff that Zuckerberg has been talking about with their image models in that there's one of the things they also highlighted was the idea that you can take a picture of yourself and put it into a video clip. And you know, we've seen a lot of open source things that can do that. We've seen a lot of other tools like face fusion or things like that that exist on the open source side.
This is a big aspect, I think, of what Meta wants to do. I think what Meta really wants to do is make these tools available in their apps like Instagram or WhatsApp or, you know, Facebook. And I think that aspect of it, my theory about Zuckerberg is it's less about like being on the absolute cutting edge and more about kind of undercutting everybody else and getting people to be on the Facebook apps like the Meta apps. So I think for him, This is another onboarding tool, right?
Like if you can get a viral video of your grandmother sending around in the three in 300, like, you know, raising her sword, that's a pretty good thing. Now it won't be exactly 300 because that is a movie that is right to it, but you can make her into a Greek soldier and kicking somebody off a ledge. Well, you know, that's an interesting question
of like, did they use any sort of copyright ish data? They do say they've trained on sort of public and licensed data. So probably you can't get yourself to be Iron Man presumably, but you know, who knows? Maybe. Uh, you have to go to the Chinese models for that, but so the Chinese models
have no problem with that
whatsoever. Very true. Yeah. Yeah. Uh, and yeah, I think that's a very good point. point. In fact, I noticed in their blog post about this, which is very long. I think if you want to check out sort of all the 90
pages, the downloadable one, which is crazy,
but paper is by the way, like 60 pages of content, 30 pages of offers, not really, but super detailed paper, which I think is also very exciting on the research front for the community and also open source front. There are actually a lot of, uh, technical innovations going on here, which is interesting.
And to a point about this being integrated into the tools, I found it interesting in a blog post, they finish with this quote, Imagine automating a day in the life video to share on reels and editing using text prompts, or creating a customized animated birthday greeting for a friend and sending to them on WhatsApp.
Yeah,
there you go. They're just telling you right away, we want to have this. They're killing
JibJab, Andre. They're killing JibJab. That's what they're doing. Do you remember Jim Jab? I, I do not, so. Jim Jab was an old, uh, for those in the audience who remember, thank you. Uh, there was an old, uh, app where you basically would replace, you would put your head on these little dancing, originally it was elves, and then it was a bunch of other stuff, so it's just an easy way to create shareable material, which I think is really, you know, met as bread and butter. Right. And to me,
it's kind of interesting. Last week we covered the news that VO of Google's, uh, uh, video should be on YouTube, right? Yeah. That'd be on YouTube. And then snap is also doing this. So this is, seems to be like another standard playbook of all these, uh, creation tools, creative tools, adding this, presumably TikTok will soon have their own thing as well. It seems like,
yeah. I mean, we'll talk, I think we're talking about Pika later, which is a really interesting example of this as well.
Hmm. And moving on to the next story, we've got one related to OpenAI. They're launching a new canvas, chatGPT interface tailored to writing and coding projects. So this is sort of on top of your typical chatGPT experience. You're still talking to a chatbot, but you can think of it as kind of a workspace for writing and coding.
You can, uh, generate writing or code directly in that Workspace and then have a model edit selected sections of it so you can instead of I guess going back and forth and getting just sort of a text chat interface, you now have an actual draft of what you're working on, and you're almost collaborating with it in a sense on editing that draft, and this is Similar
to something you've already discussed and tropics artifacts and also this tool cursor, which is an interface where you are not chatting in the code like you can have the AI modify it. And not just generate on top of what you already have. So this is another sort of trend we're seeing. This seems to be a new user experience paradigm with these chatbots and a pretty significant step in making it kind of more fluid to do whatever you're doing.
Instead of just going back and forth with a chat interface.
The thing that's really interesting to me, and I know for coders it's a huge deal because you can change like specific things. As somebody who uses it a lot to write things, or to write, um, you know, whatever, blog posts, or I use it to brainstorm, One of the things that I love about this, cause I spent some time at it is instead of having to re generate the entire document all over again, you can just generate sections.
And like one of the most annoying things about using any LLM to do, to do this sort of thing is that like when you would say like, okay, try this and do that, but then give me another version of it. When it spits out the other version of it, a lot of times it does still change in some way, the other things you didn't want it to change. So what's cool about this is say you're writing a, I don't know, a cover letter for something, right?
You can choose a section of it and say like, help me with this section, but not the rest of it. And it's able to do that. And that feels like. UX perspective, a massive step up when you're playing with this thing.
I also think the other thing that this kind of lends itself to is when we get an agent who can go in and do something for us, if you were to tell it, you know, give me this document, but change this line, like, I think it's going to be able to do that in a much more specific way than it could have before. If it can break it down this way, in a weird way, it's a little bit of like, Step by step, but within the thing, do you know what I mean? Like within a specific thing.
And that like, does seem to give you a significant leg up when you're wanting to modify something.
Yeah, I totally agree. I think having been using a cursor of this, uh, sort of similar tool only for coding, the ability to highlight a piece of text and tell the AI exactly what you want to do there. So you can highlight a paragraph and say, you know, make this paragraph, uh, more concise, for instance. You know, much better experience when just having it re, re, uh, output the same stuff over and
over. And imagine with coding, because there's, you know, bugs happen in certain sections, like that doesn't hugely step up the ability to kind of fix parts, right?
Yeah. And it also speeds up a lot of kind of boring things. Sometimes you have to do like, for instance, you know, let's say you have like a list of numbers related to a list and then you take out a bit of lists and now you need to like, uh, decrement every number, like. 10 in a row used to be, you would have to actually do that one by one. And now you can just highlight that and say, you know, reduce every integer by one. Oh, that's great. That's very cool. Yeah, exactly.
So, uh, that's kind of a big win for me. It can help a lot with fixing things, but also with just doing the boring stuff that, uh, in the past you would have to do yourself.
Do you think this and and if in artifacts like does this undercut cursor as a business like is this what cursor does like or do you think cursor still has like a significant runway to go because I think one of the other things that we talk about in our show a lot is how Oh, companies like OpenAI and Anthropic like tend to kill startups whenever they do these announcements, right? Like whatever the next step is.
I know Cursor's got a ton of funding now, but do you feel like it's got special sauce on its own to keep going? I think that's definitely
the case for Cursor just because they are, you know, only about coding and it's integrated into, you know, as a coder, you always have a program in which you edit your Exactly. So yeah, it wouldn't like chat GPT is a separate thing. It's not tied to your file system. It wouldn't be a real game changer. I do think for writing assistants, there are a few of these for creative writing and so on. This could be a pretty big, let's say, challenger. Yeah, that makes sense.
And next up, we have another story about OpenAI, this time less usable, more for devs. So OpenAI actually had a dev day where they introduced a bunch of developments related to software engineers. Specifically, kind of the most exciting part was a real time API, where you can have nearly real time speech to speech experiences in their apps. Presumably, maybe similar to some extent to what we have already done with 4. 0 and being able to generate audio at real time.
And now you can modify a speech in real time. On top of that, there's a bunch of treats that they announced. We don't need to go super into detail. But they're reducing some costs. They're introducing vision fine tuning, which is kind of important because you can fine tune things related to images, which is already the case for their text models, by the way, if you had a company, you have your own data, you have your own use case, you can pay to modify.
chat GPT to make it customized to your case and some other stuff. But the, the highlight here, I suppose, is this speech to speech offering.
Yeah. I mean the real time voice API, we dug into this a little bit because actually Kevin and I are working on an actual idea with using, um, voice stuff, which we, uh, I will, Hold on to for now, but I'm pretty excited about. So this is very exciting, mostly from a standpoint of, it does seem like advanced voice, opening eyes, advanced voice is kind of like the cutting edge, at least of what's out there in the world right now.
The thing that's, that's interesting about this is being able to implement this into, um, existing apps or new apps that would use it. Right now it's very expensive. I think one thing that came out is like, it's something like a quarter for, uh, there, I can't remember the number is, but it's very expensive trying to use it on a regular basis for a regular consumer app. But, As we know, this stuff just gets cheaper over time.
And I think, you know, a year from now, the idea that there could be exceptionally cheap voice to voice, you know, without it, without a pause, like if you've used the open AI advanced voice app, there's really great about the fact that it's listening as it goes along and it will respond to you right away. And it does feel kind of magical.
I think my personal theory about this, and I think this is just going to go further in this direction, is that Voice is going to become a massive input for everybody, right? Meaning that up till now, we got used to using voice for Siri and maybe for Alexa, but just for the most basic things. Now voice can with whisper or any of these tools, it can really Transcribe exactly what you're saying so we can get exactly what you want.
And then if this interaction happens, and again, going back to this idea of agents to become real in the world of AI, if you can say to an agent, Hey, can you write an email to this person about this and it can just do that and then you can glance it over and say, Okay, great, send it. That is a real fascinatingly great use case of what AI can do. And I think the thing this is maybe a larger, broader thing is. I think this is going to change society in a really weird way.
I think we're going to look at a world in about 10 years where everybody's going to start talking to these inner, these, these AIs, and they're going to humanize them because you're going to be spending time talking to them. I think this is going to speed up the idea that AIs are. Something other than just a computer in a weird way. And I also think it's going to change how we interact with the devices in front of us.
When you can, when you pair these with like what Meta did with their, their new glasses, I see a world and granted, I know a lot of people out there might be like, you're freaking crazy. But I see a world without keyboards, right? Like in that, that is a really weird world because we've had a keyboard as our input device for a very long time. And imagine, I mean, already on our phones, we're getting used to typing on this little thing and everybody thought we would never do that.
But I see a world where voice is probably the main input that we have into computers going forward. That feels like we're laying the groundwork for this. I agree. Certainly to some extent, I don't know about keyboards,
you know, as a programmer, that sounds radical,
but also, you know, if you can tell your AI to do like X, Y, Z. And you can tweak it a little bit. It's not the craziest thing.
And I definitely do agree that voice is going to be a primary modality for interacting with AI, you know, even already to some extent before, Oh, you're kind of moving in that direction. And certainly in the coming years, and it's interesting in some ways, like we've had pretty good transcription of audio for a while. On our phones, like instead of typing, you could just say something to be more efficient.
Personally, I haven't been doing that somehow, like the habits haven't changed to use that, but I could see it becoming more of a case as we get, you know, these smart glasses and other things. We all start talking to ourselves in public, which is
what's going to be the weirdest thing is instead of people talking on the phone, which they've kind of stopped doing, I think you're going to see a lot of people talking, but they're going to be talking to AIs, which is going to be a strange thing. It's going to
be interesting. Yeah. Yeah. And next we are moving away from OpenAI to a bit of a startup, and it is Black Forest Labs, which has released Flux 1. 1 Pro and an API.
So a reminder, Black Forest Labs are a pretty fresh company who created, or from the creators of Stable Diffusion, the seminal text to image model, And they are, I would say currently a leader, if not sort of the best image generation model, we've seen that when they enable people to play around with it, the, uh, uh, Grok on X. And so here we have the next iteration of their model flux 1. 1. The images are kind of mind blowing. And now, you know, if, if.
You know, you could still see some things in these images that gave it away for photograph type images, things that could be seen as real images. Now it's getting pretty hard to find anything. And alongside with that model update, we also did release a paid API for developers. So now if you're creating your own app, you know, outside of something that is already out there, you can use Flux in it with the API. Bye.
Yeah, I mean, flux blew us away when it first came out, like I was kind of shocked by it and really in my mind, I put it kind of taking the place of what stable diffusion was because stable diffusion is kind of falling off, you know, we'll see what James Cameron and those guys do with it. But I consistently get the best results out of flux.
Now, the interesting thing with flux is right now, I'm sure there are actually, there are a few programs where you can pay a subscription and it's integrated into it, but Flux is a pay per use oftentimes if you use it on Fall or on Replicator, all these, you know, different sort of, um, uh, server, server systems, uh, it's amazing and I think that the
truth of the matter is I'm just thrilled that there's another company that's pushing image modeling forward and the thing I'm most excited about is they teased a video model, right? They teased a video model when they came out with Flux 1. 0.
If they can come out with a open source video model that is better than, say, the runway gen three model, which I think is kind of like if you look at the runways and the lumas and all those sort of things that are out already, that feels exciting to have another person pushing that forward. Um, but it's a cool company. I think they're doing interesting work. And again, it's a really great thing because it opens the door to these.
It's also powering Grok, which, you know, has been the case for a while and allows you to do certain things that you can't do in other imaging models, which is always an interesting thing. If you're trying to play with creative ideas. I mean, I don't know if you've seen what the work of the door brothers, they're the guys who made the, the, you know, Trump and Hillary Clinton and Kamala go into the convenience store video, which like a lot of people hated, but like.
That sort of thing, which is kind of a piece of art, right, like a piece of art, is created with open source models because none of the closed models will let you generate images of, uh, famous people. Although Midjourney will let you get close. It's, Midjourney has the weirdest things where it will, it'll kind of mostly get you there, but not all the way. Black Forest Labs and Flux
have certainly kind of come out. Really quickly, uh, since being founded and this, uh, it's not just better. It's actually a lot faster. So they say that we just delivers six times faster generation speeds, which as someone trying to do a creative project or just playing around with it is, you know, a game changer. Six times. Yeah. Crazy. Definitely would be very exciting to see them release something related to video. And they did just fundraise and get a bunch of money. So we'd be surprised.
Oh, they did. How
much did they raise? I'm curious about that. Does it do you get
the exact number? But it was like in the tens of millions. Yeah, yeah. So I
mean, they're going to be around for a while. Then that's good.
Yes. Next, moving away from generating stuff, we have another tool related story. Microsoft is giving Copilot a voice and vision in its biggest redesign yet. There's a bunch of new features being added to it. It has this virtual news presenter mode. It has the ability to see what the user is looking at, a voice feature for natural conversation, which is similar to OpenAI's advanced voice mode. And this redesign is being put across mobile web. And they're dedicated windows app. I don't know.
I forgot. Copilot is a thing to be honest. Uh,
but while there's a bunch of, there's a bunch of windows users out there that like, and by the way, I was in a Best Buy the other day and they are schlocking it hard on their PCs, right? Everybody's shocking. I will say. This to me feels like what, you know, open AI, Microsoft has a giant investment, open AI, obviously, I think they own like 40, they have 49 percent of something of opening.
I don't know if it's profits or if it's, if it's ownership of the company, but they looks like to me, they got the tools and open AI announced and then dropped, right? Like what I mean is like, they put all the open AI stuff into co pilot. Um, one of the things that people may or may not remember is that Mustafa Suleyman, the guy that was the PI co founder is now running Microsoft's AI division. Um, and I think that's a big step here.
He's kind of redesigned Copilot to be a little bit more, you know, friendly and open to doing stuff. And I just think of Copilot as, The kind of like normie way into a lot of this stuff. Now, granted, I don't know what, what their audience is for this. And I know there's been some stories that like co pilot, you know, a lot of businesses have tried using it. It hasn't been that useful.
I think in some instances, this will Help it a lot, especially because, by the way, Apple intelligence still not out and I have a new phone that I still have access for now. It is going to come out, I guess, at the end of this month. Finally, but copilot is a very powerful service because it's powered by open AI. In fact, there's even a chain of thought. I remember what they called it, but there's a chain of thought that they released.
That's clearly a one in some form being dumped into copilot as well. So cool. The overall, I think this is just like, makes sense to me. It's like Microsoft has a big chunk of open AI. They're going to roll out their products under a different brand and the Microsoft world. And I think people will probably use it. I think this will get in front of a lot of people. I think so. Yeah. Maybe
because I use a Mac and don't use windows very much, I'm going to forget about it, but I'm sure it's pretty prominent in the West and across anything you use in Microsoft. And I do agree. It seems like there's a bit of influence of that pie sort of consumer facing and yeah. Definitely, it looks more sleek and it looks a bit more approachable compared to something like ChatGPT. So I think they're also trying to differentiate a bit from the other chatbots, where it's just a text box, you know.
I think the thing I would really love, because I do have a, I do have my closet up here. I have my Microsoft gaming PC that I've used for AI stuff before that I haven't brought down for a while just because a lot of the stuff I'm doing is in the cloud. But I would love them to do, and again this may go back to not being able to do it yet because you need an agent that can operate on your hand, on your way. Like, I would love it to be able to do things on my PC for me. Right?
And I don't even have to do stuff on the internet. If it can do something as simple as if I can tell it, You know, uh, go update this piece of software or, or not even that, like approve this piece of software, like things like that, or, or even like, you know, eliminate this file, that would be a really huge benefit to me. And I know at one point Microsoft was really trying to get into that, like. Finding ways to make it useful feels like the next step here. Like, okay, it's cool.
I can talk to it and I can make it say different weird things, but right now. The use cases are not that strong. And I don't think we'll be that strong until we start having it have some sort of ability to manipulate. Files were do stuff for us.
Yeah, I totally agree. I think the next step would be something like, you know, in this folder, delete all image files or rename each files by taking out this little piece of text. Something I've had to do, you know, the annoying repetitive tasks. That's going to be AI pretty soon. And the last story, I think you mentioned Pika and now we are covering their news. They have released Pika 1. 5. So just a reminder, Pika Labs is an AI video platform.
Uh, one of these things comparable to Luma and also Runway Gen 3. And with 1. 5, they're saying was they're focusing on the hyperrealism and particular, they highlighted. Pika effects, which are is like lifelike human and creature movements, sophisticated camera techniques. And I think for me, the most cool part was seeing some of these videos with physics, seeming thing like smoke. And I think, uh, things dissolving, like being kind of liquidy. And it was pretty smooth.
I was, I was pretty impressed by this.
Yeah, so I think what is, so there's a couple things going on here. One thing, Pika released a new model, which is great. And you always want to see these AI videos coming to push forward to come out new models. But Pika has always been a little bit of, a little behind some of these other models, right? In terms of quality.
The thing I think that's really different here, and I think a smart move on Pika's part, is they've pivoted slightly into these like, almost I'd call them like AI animation templates. So there's about six of them that they call Pika effects.
And this is like, um, Slightly different than what they're in their beginning of their trailer video, they're kind of talking about the ways that you can manipulate the new the new AI video model, but these are like specific animation types that you can do with stuff and one of them is called inflate. One of them is melt. One of them is cake so you can make things out of cake.
So what's cool about this is you can take any image and it will give you a similar effect and it kind of knows what the effect of it's almost like a video Laura in some way, which is kind of cool. Thanks. And I think the smart thing for Pika is it's a little bit like when Vigil got popular, right? If you remember Vigil, the company that made the little Yachty kind of jumping on the stage thing and everybody did those animations.
If you can find something that will go viral for you, it's a massive selling point. But again, going back to like the meta stuff, it's like, The way that you get people to use this is by sharing it, right? And, and some of these AI video models are looking to be high end tools for filmmakers or, or even AI filmmakers who are going to spend all their time and make a two to five minute film or maybe longer at some point to make a really compelling video to watch.
Other ones are really about capturing that kind of like casual user who's going to make something funny with their image and do something. That feels like a right direction for Pika to go in to me because in some ways I don't think Pika is going to beat out even the Lumas or the Runways of the world. So, you know, it did raise a lot of money already. I think Pika raised like something like 70 million dollars.
And that feels like a raise for somebody that's trying to be the AI video actual model generator. But, This could be a really interesting use case if they can grow and get big enough. I don't know if it's, it feels like to me, like, I don't know where this company goes from here. Maybe there's a world where they can make this into something, but it feels like. It's going to be a tricky pathway, but I really do love these new effects. I think it's worth trying it and anybody can do it for free.
That's the most coolest part about it. It will take you about a half an hour to get a result back because the free, the free generations are put way behind in the queue, but I think it's at pika. art and you can go try it for free. And we had a lot of fun with it. We actually did our thumbnails for our show and you know, the inflate thing was really weird. It's just two of our heads. It inflated us and then brought us together.
Into some weird flesh bulb at the bottom, which was really disturbing, but it's fun to watch. Yeah, that's a
good point. I think, uh, personally with all of these video generation services, it does feel a little implausible that you could actually make it a business on the consumer front. Yeah, certainly. I think opening eye we've known has. Gone and had meetings with Hollywood people. And I do think part of the reason they haven't released Sora is that it wouldn't be a real moneymaker from a consumer perspective.
Yeah. So, um, yeah, with Pika, this idea of Pika FX, maybe something more filter like, right, which people already use a ton of, could be
a smart play. Well, and also it's like you think, well, could a company like, I don't know, but maybe not by dance, but somebody who's got like Snapchat come in and buy Pika. Yeah, sure. That's something that would be really useful because then you turn these, this whole group on to making more effects for you.
I just think it's, it's an interesting thing because I think with Sora and with, even with Runway, like you can tell they're actually trying to, they're Push themselves in some ways towards Hollywood, right? Because worst case scenario, Runway and Sora could both make incredible, um, drone shots, right? Or, or establishing shots that could actually be probably close to used in, in movies right now.
They may be a few years away from making full movies from this stuff, but Pika feels like it, maybe it does need to go down that pathway.
Exactly. Yeah. I think runway in particular is building itself as a, you know, it's, it's not just video generation runway has a whole suite of AI tools
and all that stuff. Yeah.
Yeah, yeah, exactly. So, uh, there needs to be some differentiation is space. It feels like in runway is definitely the leader on that front of trying to be like for creative professionals. And moving out to the applications and business section, we begin with another big story of the week. One that isn't surprising, but I guess is nice to have. And that is the end of the OpenAI fundraising saga. So we've been talking about this on and off. We've covered a bunch of rumors.
Well, OpenAI has now closed their VC round and this is the largest VC round of all time. Oh is
that right? I didn't know that. That's amazing. That's amazing.
Yes, exactly. So they have raised 6. 6 billion from various investors. That values them at 157 billion post money, you know, has a bunch of investors as you might expect, Thrive Capital being the previous investors. They have, A lot of reoccurring investors in this, although there are some interesting details here.
Uh, supposedly they did ask the investors not to invest in other competing AI companies, which I guess makes sense, but also, uh, is a bit of an ask for, you know, OpenAI, which does have pretty significant competition from Anthropic and Google and so on. But yeah, it seems like people have a lot of faith in OpenAI still despite
that competition. Yeah, so I've been thinking about this story a lot. We talked about it on our show as well last week. And, and one of the things, there's two things that really stood out to me. One is, um, obviously the size of this round is big, right? And you can't imagine like any, almost any other startup in the history of startups, as we just said, ever raising something this big without going public.
But two, it also, I've been thinking a lot about like what this money is used for, because, you know, one of the things that's famous with OpenAI at least to now is that like, it's not, It's not making money overall, right? Like it's, it's bringing a lot of money in and it does have a lot of money coming in, but it is still burning through money overall, right? Like it's, it's costs are higher than it's, than it's profits. And I think.
My thought is, okay, well, you've got 6. 5, 6. 5 billion now to do whatever the next generation thing is. I have been thinking about like how far away we are from seeing what OpenAI has in its packed pocket. Meaning that like, we're just now seeing O1. Supposedly the rumor is that O1 is what Ilya saw back in the day that freaked him out. And that was like a year ago, almost now, right? November. Sorry.
And so my question is, what They're showing to investors because investors are going to get the open cloak, like, look at what the company is doing right now. They are not going to be throwing a 6. 5 billion to the company if they don't know what's happening. Clearly, the next step of whatever is behind the door right now is a significant step.
And that's what I would imagine, you know, they've got, I can't remember what his name is, but the guy that came in as like the new business guy, Brad something, I'm sure that they are projecting out all these different business use cases for the, for the software that don't exist yet, because there's no other way to imagine developing it at that high of evaluation without that. So I think this is a point to. how big opening eyes next leap could be.
Now granted I'm not an expert but what I'm thinking is the only way you get this many people to put this much money into something is because you've shown them that there is a massive output of money on the other end. That's what this feels like to me.
Yeah, exactly. I think that's sort of right. Take the primary. I mean, obviously they are burning through insane amount of cash as is just due to the employees. I do think we're not profitable at all at this point. It's not clear what the margins are on any of these chatbots, but even that aside, I think we're still very much seeking to build a GI, right? That's their main thing, and that would involve training at GPT 5. Yeah. You know, GPT 4 costs on the order of hundreds of millions.
These frontier models in general, including Lama 3, are costing something like hundreds of millions. And GPT 5 could cost like, you know, tens of billions, potentially. Which is crazy. And that's
the funny thing is like tens of billions, they don't want, they didn't get that. Do you know what I mean? That's the funny thing. Now, granted, they've got Microsoft in their back pocket to help them do things. And there is that story where like Microsoft is helping to turn back on. Three Mile Island, and they're going to get all this, uh, the resources that Microsoft is going to try to pull for AI training and, and data centers.
But like, that's the thing that kind of worries me is that like, if they don't have that thing, kind of the next generation trained already, then like, there's a lot of money that's going to get burned through just to do that. Right. Cause they've got to buy the chips. They got to do all this stuff. That's why I think GPT five is probably. Baked already, and that this is a promise, this, this investment is a promise based on what the results of that are. Right?
Like I think that's probably what this is.
Yeah, I could see that as well. It, it's been, you know, I forget like almost a year and a half for, yeah. Quite a while. PT four, right. GB four and, yeah. We've had improvements certainly in that timeframe for, oh, uh, sorry. Uh, Oh, one, the very new naming scheme is so confusing. Yeah. Uh, but yeah, their reasoning model certainly was a pretty significant leap, but still it, it doesn't feel like you had GPT 5 and they have released GPT 5.
So I guess that's gotta be the question of like, do they have it? Do they not? I think they definitely have started training it, uh, probably finished. And I guess. But you would need the billions for GPT 6. Yeah, that's, that's kind of what I
think it's probably the case is that GPT 5 is, is at least started training. If not, if not, like not done, but like, I think that probably the next version of what, and now you're right. It's, is it, oh, oh, one, we're going to get, it's already done. Right. Oh, one, so when previews out. Maybe it's going to be called O2 for all we know, or there's GPT 5 plus O2.
Like, I really wish we'd get some sort of significant things, but it does feel like this raise is really about where it goes after that point. The other question is like, at some point, this company, you know, 157 billion valuation, this is kind of business y, but like, a company like that has to go public at some point. And, and by the way, maybe that helps OpenAI because I do think in some form. OpenAI is probably a little Tesla like in that it would have a lot of, like, meme stock potential.
So maybe if they go public, it becomes, uh, it just shoots their valuation up in a significant way versus, like, um, something that would be more difficult for a different company.
For sure. Yeah. They have a brand recognition of Chad GPT. And I think for retail investors, you know, they know of that. So I could see that happening. And, uh, one last interesting detail here for the fundraise. It sounds like there was a provision of that. If OpenAI failed to restructure into a for profit organization within a year, the investors could claw back the money.
So they're really promising here that we are going full on for profit, normal company, you know, forget this non profit business.
I mean, I think you have to if you're raising that much money, right? And again, I, you know, you can argue that it should be a nonprofit and there's a lot of people out there that believe in that, but also like, there's no way to get that much money from people and say like, okay, guess what? We might still be a nonprofit.
Like you got to, do you remember that like original image that was on the OpenAI site, which talked about how it was a nonprofit and like, it wasn't going to spend, you know, it just was a bad investment. That can't be the case when you're taking the 6. 65 billion and you have to be promising at least these people are going to get, you know, I don't know, two to five X, if not more on the money that they're investing, right?
And next, we've got a story that hasn't generated so much hype, but I do think it's kind of interesting. The story is that Google is bringing ads to AI overviews. So, they're going to start displaying ads in these AI generated summaries provided in certain Google search queries. And we'll also be adding links to relevant webpages in some of the summaries. So this has been something we've always wondered, and that has been a kind of conversation topic.
It costs a lot to do these kind of AI inference for search much more than just a normal Google search. So you need to somehow pay for it. Right. And so we've been wondering, do we get ads? Is that how going to be your business model? And here we go. We are starting to get it. And I do think there has also been, uh, I forget if this is actually out, but I've seen information about this coming to Bing as well.
So, uh, I think Not surprising, I suppose, kind of what you might have expected, but, uh, still, I guess, uh, worth noting that nothing comes for free. And now if you do use this, uh, you will soon start seeing ads coming out and being labeled as sponsored, appearing alongside non sponsored content.
Yeah, I mean, you have to realize Google was giving up probably the most valuable part of their search page to this AI, uh, uh, search, uh, paragraph that they were producing, because oftentimes that very first result is a sponsored ad, right? Like you can search for, I know, REI and you get an email, you get a Yeti ad at the top, right? Like for camping gear. And I think this only makes sense to me.
You know, the other thing that follows up on is the company perplexity, which is like an AI direct search engine also announced that they're having a business model planned on when you reply to a question, when you reply to a, so you get, you put in your search query, you get your answer. And then when you reply, they're going to allow sponsors to reply back to you, which is an interesting, different way of looking at this, because then you've already shown intention.
So imagine that is since you're searching for like, What's the best cooler I can buy it, you know, for a camping trip. And it comes up and it says like, these are the three coolers. And then if a Yeti cooler, you say like, Hey, what's a, you know, I let me, tell me more about the Yeti cooler. Well, Yeti could come out or REI could also say like, Hey, do you want to see a discount on REI? That just feels like this is the give and take of what we're going to see with these things.
Like, unfortunately or not, this is the money that has paid for the internet at large for a long time and grown these companies into, in some cases, trillion dollar companies. And Google specifically, is trying to figure out how to make this AI stuff, make it money. Up until this point, I think they were playing defense, trying to kind of like break, uh, the fact that chat GPT was taking a ton of search.
I don't know about you, but like, I spent a lot more time searching in chat GPT now than I thought I was going to. Like I actually asked the questions that aren't specifically related to like finding a product. I often ask in chat GPT, which is probably like at least half my search queries. So. This is just a way I think that they can now monetize this thing that they've invested a lot of money in. I still don't know if it the long term.
I think Google's search business is kind of borked in that like I think there's going to be a lot of different people coming along. They're going to be trying to do it in different ways and Google's just got to find its way to at least hold on to like At least half of that, right? If not more, because that's their, the vast majority of Google's business is search advertising.
Exactly. Yeah. They really do need to find a way to stick around. And I guess they've, this kind of feature of theirs, the AI overview is pretty comparable to what you've seen from others. You know, Providing a summary of various webpages. It was a bit silly and made some funny mistakes early on, but I'm sure they've kind of improved on it over time. Yeah. I mean, the interesting question is, um, are they going to release a sort of optimized search for.
Let's say creative professionals that will cost more to rival perplexity, right? Because for perplexity, they do have a more expensive tier of model and of search. And you have to pay 20 a month to be able to do this more in depth kind of exploration. And certainly I could see a lot of people paying that subscription cost. Yeah.
To be able to use that as a tool. I mean, at this point, even it's like, I would pay for a Google search product. Probably. I mean, I, I already pay. In fact, I have to, I have so many like dumb AI charges that are going against my express right now. I have to kill some of them, but like I pay that 20 bucks a month or whatever it is to Google for the, for the cloud storage.
Right. And like, In some form, I wish that was like YouTube premium where like I was not having to watch the ads, or I was getting a more dedicated Google search tool. And I think that again goes back to like search, you know, subscription revenue versus search revenue. And it'll be interesting to see.
I think the other big thing with Google is, you know, if you listen to like Neelayh Patel from The Verge is really good on this, but the entire internet is changing, where it used to be based on search and SEO, and now it's going to be based on these kind of AI searches, and that just is going to fundamentally change the business of the internet, and you know, it used to be
about how do you get to the top of the search page, and now it's going to be a little bit more about like, A, how do you get surfaced in these AI search engines or how do you get serviced by the AI, but also like, how do you make it so people go directly to you versus trying to come to the Google page, which is going to be a weird transition point for the internet at large. Onto the lightning
round with some shorter stories. We begin with a bit of a trend. Anthropic has hired another OpenAI affiliated person, this time a co founder, one of the lesser known co founders, Dirk Kingma. Who was there from the start, uh, did leave open AI back in 2018, uh, joined Google. So actually this is not someone leaving open AI as you've seen in a lot of cases, but nevertheless, yeah, and Frappik has a lot of former open AI people now.
And if nothing else, I'm sure that helps her case for being a strong competitor.
Yeah, I mean, I drop it also is has drops coming. I'm sure if nothing Opus 3. 5 has got to be baked and ready to ship at some point. I don't know what they're waiting on, but also sauna in Opus four is probably coming when you add a 3. 5 level, you know, there's a four coming at some point as well too. So I'm excited to see where they go from here.
Next in our story on OpenAI, as always, we have a lot of these, and I like the title of this article. OpenAI's newest creation is raising shock, alarm, and horror among stuffers, and it is their new logo.
So, supposedly, in a recent company wide meeting, they presented this new potential logo that is being described as a large, black, white, O or ring, basically something like a zero or like the letter O. And if this is moving away from there, I would say pretty iconic at this point, logo of this hexagonal geometric shape that I've seen so much over the years. Yeah. And I would be pretty bummed if they actually do make this change, because this sounds a lot more boring.
Well, also, isn't the scariness of AI really helped by a giant black dot that is like gonna zoom in and out when it talks? Like, that is the scariest version of AI I can imagine, is like a black thing that just kind of like, it's almost like a 2001 sort of vibe, right? Like, and I think that like, I hope this isn't the real logo because it, it, it doesn't look compelling to me. It's not optimistic. It feels very, um, brutalist.
If you know much about like, you know, architecture and I don't love brutalist architecture, I find it very like, um, bland and, and kind of cold. And I hope this isn't the case for, for what they're going to release. Exactly. Yeah. It sounds
very cold, very sort of inhuman. Uh, yeah, exactly. Like you think
we would want to push it a little bit more towards the human side rather than it being the inhuman side. But. Look, maybe Sam in his heart is just racing us to the ASI and if we're all going to be batteries that get plugged in, maybe,
maybe he's already made
a
deal with the AIs. Yeah, who knows? Next, moving away from OpenAI, the next story is that Waymo is adding Hyundai EVs to Robotaxi fleet under a new multi year deal. So this is about the Hyundai. Ionic five electric vehicles, a pretty expensive line of electric vehicles. Uh, you could say comparable to something like Tesla.
And they are saying that, uh, Waymo's sixth generation autonomous technology that they are calling way more driver will be integrated into a significant volume of ionic five E reeds to support their growing robot taxi. business. Uh, for me personally, pretty exciting. It does seem like Waymo is a bit constrained in their ability to get your hardware out there. If you go to San Francisco now for a while now, anyone in San Francisco could use Waymo.
The wait list is over and now it takes like 25 minutes to get a Waymo. It's crazy.
Oh really? Oh wow. So I, I'm actually in LA. I'm, I've got approved. I haven't done it yet, but I'm on the beta test for it here. I might try to take one soon. Um, but Well, of course, right now they're all Jaguars, which is like a very expensive car. And I also think the interesting thing with Waymo versus something like Tesla is obviously the hardware you have to put on the car with the LiDAR is a more significant kind of lift than it is just to like ship the car as is and hope it works.
I do think though, the more I look at Waymo, and again, I haven't been in one yet, but I've learned a lot about it. I've been kind of obsessed with driverless cars for like 10 to 15 years because I thought I was going to come sooner than this because Elon promised it a long time ago, but.
I think this is as transformative as in some ways as AI is going to be for the average person because I think there's a world not that far away if this is working, which it seems like it is right now, where 10 to 15 years from now, you just don't need to buy a car, right? And I think that's a big Huge change to American culture. Now, will people still buy cars? Of course they will, right?
Like there's going to be people that buy cars for, for fun or just because convenience Waymo doesn't service them. But when you think about city driving and being around to get around inside cities, I just think that this is going to probably save people money versus car ownership. And that feels like a big step in, in just having more cars that they can put their technology into makes sense to me.
Like, I think ultimately The key to this and what Cruise was always so interesting about is that Cruise was trying to promise that you could put your their software and their hardware onto pre existing cars. In a world where like say Hertz finishes with their rental car fleet for the year because they always turn their rental car fleet over, imagine a world where Waymo just buys those, right?
And they're a lot cheaper, you can outfit them all and then suddenly you've got an extra 100, 000 Waymos. That feels like where we're going to be getting to and and I think that feels like A big step in the right direction.
Yeah, I totally agree. This was something that was being discussed early on, like, you know, many years ago when people were still excited about self driving technology of this idea of the end of private car ownership that once you can just call essentially for probably very little money, like you could always get an Uber and it won't be very expensive, uh, and it'd be self driving Uber. Right. Maybe you don't need to buy a car. Personally, I would find that pretty exciting, but yeah,
I mean, why not? Right? Like, I mean, again, like my wife and I have talked about, we live in LA now, but when my oldest daughter is about to graduate from high school, we're going to go back to New York, I think, which is where we love living. And one of the most amazing things about New York is you don't need a car, right? Like you really don't. And you can walk around and get around to so many places and it saves a giant hassle cost, but also hassle.
I think in the world where that happens in most major metropolis cities, which I think is possible, I think we're getting there.
Next up, gotta have a story about chips and hardware on the show. So this one is about Cerebris, which we've covered many times. They have their own kind of cool chip design that's very different from the standard ones. They are filing for an IPO. So they have released this investor prospectus. And, uh, And they are going to try to do this initial public offering, which is a way for investors to buy their stock on the public market, a way in which they could raise a lot of money.
Uh, so we haven't seen a lot of AI companies are staying private, like OpenAI, of course, getting a lot of investment. Cerebros is pretty mature. They've been around for a long time. They've had many rounds of fundraising. So I suppose it makes sense for them to try and get more money via an IPO. Yeah, I
mean, I assume that there's these IPOs are coming for a lot of these companies. I mean, Cerebris at least is a chip company so that they have like, so they can show a scaling, especially if AI continues to go. I mean, Look at GROK, GROQ, that company as well. Like, I think there's going to be a lot of these companies that are hardware based that are probably going to do OK.
I think the hard part is, though, is like any of these companies is competing against conceivably the largest 800 pound gorilla of 800 pound gorillas we've ever seen, which is NVIDIA.
And like my feeling with all these hardware companies is like, OK, can you show I know both Cerebris and GROK have shown significant advantages over NVIDIA, especially for AI, but like You're competing against the most capitalized, one of the most capitalized companies in the entire world who has thousands and thousands of employees fighting against you.
Like, I, it would just be a tricky investment for me because I don't know if at an IPO level you can see something like this totally taking off. But again, I just think more of these hardware companies are going to get to this place.
And last up, a bit of drama, and we always love having something that's a little more dramatic and perhaps amusing on the show, and this one is about a silly ish thing that happened with a startup. The title is Why Combinator Is Being Criticized After It Backed An AI Startup That Admits It Basically Cloned Another AI Startup. So we're talking about cursor early on, that's a interface for coding. And now we've had the story about Pair AI, which is also an interface for AI coding.
They actually did a fork of an open source interface called Continue. And the part of this that was bad is that they replaced that open source license that you're supposed to keep for this one in particular, you can't change the license. They just replaced it, uh, with what. appears to be, or was said to be an AI generated one. They just shaggy betated and like pasted it all over. And of course, uh, they came with a lot of criticism. They, uh, kind of admitted it.
And they did, by the way, say that they forward to the, um, Projects. So they didn't sort of pretend this was theirs, but nevertheless, this is kind of a silly thing for a software company. And, uh, yeah, they came with a lot of scrutiny for sure.
Listen, I, I, in this world, there is a very fast turn to the, you're a grifter mentality, which I don't know if you followed, like there's that whole reflection 70 B story, which was obviously quite a thing. I think a lot of people in this world are just trying to make things that are interesting and cool. Like in cloning software, obviously seems pretty annoying.
And I think the bigger thing is like, if these guys did this and they weren't a Y Combinator company, it probably wouldn't be nearly as big a deal because Y Combinator has the kind of history it has, like the kind of gravitas of both Sam running it, but also, you know, Paul, um, and all those other guys that kind of like started this company that Airbnb has come out of and all these other sorts of things. I think in general, these stories.
Get overblown quickly, but at the same point, like it's kind of dumb for these guys to have done this and kind of done it the way they did. And I think part of that side of it is like everybody's just racing to try to do these things as quickly as they can because they want to be able to like make their impact on this world. There's a lot of people in this space who are like trying to believe that they have the next big thing and they want to get to it before somebody else.
And sometimes I think it's just better to like, hey, take a second and think about. A truly unique, interesting implementation of this stuff, rather than how can we make this thing that somebody else did and just do it in an open source way. Although I will say also, open source cursor is not a bad idea in general, like it's good to have something like that, but the business side of it may just be different. Exactly.
Yeah. And, uh, you're right to say that this is not kind of a huge deal. We shouldn't think of it as a big deal. Demonstrating that they're drifting. In fact, you know, the bigger side of this is how this reflects on why commentator. So yeah, there's actually, uh, you know, for a lot of our listeners, you might know hacker news is a site where programmers go to chat about these kind of things.
And, uh, of course, there was some discussion there and some criticism that, uh, why commentator has perhaps declined in terms of their process, very due diligence. But now we're getting very inside baseball for Silicon Valley. So one thing I'll
say is that Y Combinator is kind of like, let's call it the Harvard of startup founding, right? And so like anytime a Harvard or somebody at that level screws up, there's going to be a thousand people that come out of the woodwork and talk, you know, start calling this, this kind of like old generational thing out of touch or like clearly, um, immoral in some way, because it gets the attention on it. I think white comedy are still fascinating company. I think they do interesting things.
I think they give startups a really interesting lever to get off the ground. And it's just, it feels a little bit like, um, um, uh, what is it called? Sour grapes at a little bit, but also understandable.
Onto research and advancements in which we have a couple of papers. The first one is titled where RNNs all we needed and RNNs if you don't know is recurrent neural networks. That's the thing that people used to use before transformers became a thing where you basically have a bit of a loop of input and output and you do kind of intuitive thing, which is you take one input at a time and you. go through your input and process it.
This used to be the way we process text before Transformers, but that fell out of favor due to various reasons. Basically, it was kind of harder to train, harder to scale up and parallelize due to how that works. And in this paper that is co written by Yoshio Bengio, notably a very influential figure in AI, they took the traditional forms of RNNs, LSTMs, and GRUs.
And simplified them a bit, removed some of the things that required that inefficient training, leading to a minimal version of the RNNs that they say might actually just work, might be usable and comparable to transformers and things like Mamba and so on. So that's why they have that question of where are RNNs already needed. Now, they don't answer that question because they don't scale up and, uh, you know, actually compare at large scale, which is what we care about these days.
They only have initial results on a relatively small data set where it does show that, you know, there seems to be pretty, uh, comparable and, and good, uh, results. for the small set of data. So ultimately they do kind of justify the question being asked. They don't so much, uh,
answer it. But let me ask a question as a non technical person, and I'm sorry for the audience. I know that probably the vast majority of the audience is more technical than I am. Um, you know, they've always thought, I always talk about this idea that LLMs can scale to a certain point, but then there's going to be a fall off, but that the idea that. LLMs alone are not the answer to AGI.
Is the idea with this like a parallel kind of like pathway that could somehow intersect with LLMs that kind of make a much larger model together?
Yes, I think that's pretty much it. And this has been a real trend sort of in the background where people have been exploring alternatives to a traditional techniques for LLMs for a while. Because as you scale up, there's this kind of famous quadratic thing. The longer your input is, the more it costs at this kind of exponential rate. And there's ways to make it linear where for every additional input, you pay the same price instead of it having this, uh, square root power effect.
Uh, this is another take on that essentially, where we've seen state space models, Mamba, and so on. We've seen XLSTM now we have a main LSTM and main GRU. So it's, it's kind of really ongoing for a while and we still haven't seen any of these models go really big and, uh, be a game changer, but certainly seems like this could be part of what allows us to go to these. models or whatever. Got it. Okay. And next you've got Mio, M I O, I'm not sure what it is.
A foundation model on multimodal token, uh, tokens. So that Tokens or tokens? Which one is it? It's supposed to be tokens. You know, sometimes I say things wrong, but, uh, yeah, multimodal, you know, we all know that's been the big trend of this year. Things like, uh, GPT 4. 0, where you can take in multiple modalities, images, text, audio, and outputs that. And, uh, multi modal tokens. What that means is that, you know, there's different ways to do this.
So you can train your model to have different encoders, to sort of separately process images and texts and combine them in the middle, or you can do this training where you just interleave, you have image tokens, you have text, Text tokens, you have audio tokens, and all of that is just part of your input. And you can mix and match in whatever way you want, which allows your model to be a lot more flexible. So that is a focus here.
They are releasing or saying that we'll release, uh, an open source. Um, one of these multi model, a large language models that will be able to Taken a variety of input modalities, and then also output a variety of modalities specifically here for modalities. I think we don't have a very strong open source multimodal model.
We've seen some sort of efforts on that front, but you know, nothing like LLAMA and one of the exciting things about LLAMA 3. 2 as we covered last week was them adding vision as an input. So this is going beyond that. This is going into text images, video. And, uh, would be pretty exciting. Yeah, I mean, I get
the llama glasses thing keeps, I keep coming back to it because I just think all this stuff is so friction based right now to try to get access to any of it. But the minute that it becomes in something that you're wearing all the time, or you're putting in your ears, one of those two, like, you know, It feels like it opens the door to so many more use cases rather than having to, like, I don't know the meta stuff, even with the multimodal.
I've spent some time, but to get access to, you have to go to like metas chat, which is like a knowing thing to get to. Like, I just think the multimodal background on these models is going to get better and better than suddenly we're going to have a pair of glasses where it's like, Oh, this is what this is. Like, it's just going to get better in the next two years, three years. Then we're going to have a pair of glasses and it's just going to be like kind of a mind opening experience.
And just one more story. We've got something from Apple, which is always fun to see. They are releasing Depth Pro, an AI model that rewrites the rules of 3D vision, according to this article title from VentureBeat. So this model Depth Pro can generate a digital 3D depth maps from single 2D images, inocular depth estimation, if you want to get technical. And, uh, this is, uh, you know, obviously kind of a leap in this, uh, degree of accuracy.
They have very high resolution depth maps and fast in just 0. 03 seconds. So you must imagine this came out of their work on, uh, their vision headset. They, this is like essential for things for AR and VR being able to estimate depth, which is like, you know, How far stuff is around you. And, um, they say that this works, uh, kind of in various settings. So you don't need to train. in various environments, retrain the model. It'll just kind of work.
Uh, so it's a pretty challenging task actually, and has been one of the kind of important tasks in computer vision for decades. Being able to estimate vision used to be, we need two cameras, you do stereo vision as we do. But nowadays, because of machine learning, you can get away with using a single camera and being able to estimate depth pretty well.
So, uh, yeah, it's, uh, cool to see Apple kind of improving potentially their machine learning, their, um, ability to do cutting edge stuff with things required for their products.
It also makes me think about the idea of going back to the self driving cars, like, so much of that is based on vision, right, too, as well, or LIDAR, or trying to find out ways to kind of purse out depth of where things are, I just feel like that technology is going to improve, like, so fast, now that we're doing all this stuff for here, it's also going to go there, and The modeling of the real world seems like it's really the next big step. And, and this kind of feels like a way into that.
There's a lot of implications to go into of like for robots. That's another big trend of trying to get smart robots. They need this sort of stuff. And, uh, this is being open source. So the code and pre trained model are being made available on
GitHub. I, by the way, I would love to know, I'm sure it's got to be in somebody's brain, if not on a piece of paper somewhere at Apple, their humanoid robot, because to me, to me, as much as the Vision Pro and the glasses are going to be a thing that Apple does clearly, and I think that the glasses are probably, now they've seen Meta's, you know, product, they're probably pushing forward, and there's
a real thought that maybe Meta released these things so they can get ahead of Apple before they, you know, because Meta's saying, still three years away, they're saying, um, or maybe that much. But imagine Apple's humanoid robot project, which will just be sitting in a research room under six different locks for the next five years. They have a ton of data that could come out and they, you know, they always wait for other ones to come out first.
So like the, I know the Tesla event is coming, I think on Thursday of this week, right? On the 10th. It would be really interesting to see how this stuff also translates into an Apple humanoid robot in some form, which is a big moneymaker for them and could be a next big product. I could definitely
see it if, you know, if you do buy a home robot to do chores for you, I could see Apple branding being a big differentiator. Now, it'll be interesting to see if they are going to commit to that given kind of a disaster they had with the self driving car project. Yeah,
who
knows,
right? I mean, I mean, the thing is, Apple can commit billions of dollars to something and they just bury it as a loss, which is
pretty crazy
in
some form. But either way, uh, cool to see them, uh, training this and releasing it. Yeah. Cool. On to policy and safety and we begin with another sort of big story, the end of the SB 1047 saga. So we've covered the progress of this bill on and on multiple times. It has passed the house. It has been kind of waiting for the California governor to either veto it or approve it, and he has vetoed the bill.
So as we 1047 the regulation bill that would require safety testing of large AI systems before their release and had some other ways that regulated large companies giving the state attorney general the right to sue companies.
Governor Usurp has vetoed it and has argued that it focused too much on regulating the largest AI systems, you know, model size as opposed to the use, uh, side of it, like the outcome of using AI, which is one of the big debate points of like, should you regulate according to the size of a model and require things at development time? Or should you regulate more at deployment time? What happens?
So, you know, obviously sparked a lot of conversations, sparked, uh, some, uh, thinking about, you know, this was a bit of a test. This was the first big push to AI coalition within the United States, uh, for things like frontier models and they just vetoed. And, uh, where
did, uh, where did Jeremy land on this? I was so curious. Cause I missed him. Have you talked about it? Was he for this bill?
I think the consensus, my impression is among safety People is that this was a good step, that this was useful, especially kind of early on, you know, I think people tend to be fans of the idea of that at certain, uh, thresholds of size, you should have some requirements for your models for safety tasting. So, uh, yeah, certainly I think safety, uh, people are into people.
I mean, I think it's an interesting, I, I, I, obviously the hardest thing about this, and I'm not going to say that I'm a sort of expert on the policy side, but the hardest thing about this is how many people are voting on this that don't really understand this stuff at all and might just be voting on it based on the idea of like, we got to stop these systems or, or like, you know, not really having a sense of it.
I do think there is a real, uh, I mean, I'm sure you follow the Chinese stories as well, too. And there is a real thing to be aware of when you compare development of what's going on in China. There was that story recently where they said that they're starting to pull together training, um, uh, centers to make AI models in China, because, because it's the government and they kind of control everything. They can just bring all these companies together to train massive things.
So a part of me from a political standpoint is like, gosh, it's really hard to put any sort of like barriers up against AI training right now because of how you know, it could become a geopolitical race. That said, like, I don't want bad actors screwing with these systems in a sort of specific way either. We are maybe getting to a point where these could be pretty dangerous.
I mean, I think everybody who runs these companies has said to some form of look, something bad will happen and we have to assume that's the case. And that's the kind of case with every new technology in some ways. And it's just one of those tricky things where. What I hate, I would hate for it to happen is like something really bad to happen and then the backlash is just so strong that like suddenly nothing is able to be developed in that space. So, it's a very complicated conversation.
I think I kind of get why Newsom did this in part, but also like there needs to be some version of this in the world I feel like.
Right, yeah, the quote that sort of highlights the decision is, I do not believe this is the best approach to protecting the public from real threats posed by the technology. Instead, the bill applies stringent standards to even the most basic functions so long as a large system deploys it. So it's kind of arguing for something a little bit more like the EU AI Act, which is much more application focused of like, Although that's
also stopped a lot of people in the EU from getting access to things like open, advanced voice and other things like that. Yeah,
exactly. So there's definitely a big trade off to be had there. And it's not entirely surprising. I think we did see this coming and I would not be surprised if we saw another iteration of this bill being introduced in the next session. But either way, you know, another Long running saga. We, we had this with the UAI act where for, you know, years and months, we've had the progress of the bill coming along and the conversations and takes. And now that is over for this bill as well.
And the next story also related to California law. In this case, it's about a judge blocking a newly passed AI law related to Kamala Harris deepfake that apparently Elon Musk reposted. So this new law was AB2839.
We saw quite a few laws like this being signed where it says that you need to Uh, for distributors of deepfakes on social media, particularly, uh, deepfakes of political candidates, uh, that could potentially confuse voters, the law would require that you could not post those kinds of things. And this was challenged by a poster of an AI deepfake of Vice President Kamala Harris. The argument was that the deepfake was satire.
And the U. S. district judge in this case ordered a preliminary injunction to temporarily block the law's enforcement, saying that the law is too broad and could lead to overreach by authorities. So, quite interesting to me. I could see the argument that this is satire, that AI generated video is in some sense satire. Uh, doesn't mean that this is, uh, finalized. It's, it's, as we said, temporary, so I would be very curious to see how this goes.
Yeah, I mean, there's an argument here. You could say like, well, what if you add somebody that was a Kamala Harris lookalike and you created a sketch where they did the same thing? Now, granted, it's not using her exact face, but also like, it's still satire. And there is an argument here that you can really see is a first amendment. I mean, I, as somebody who's written comedy in my life and worked on late night shows, and you definitely want to have the ability to do some of that stuff.
And, and, you know, I know very famously, uh, Matt Parker and Trey, uh, um, Trey, Trace Stone, is it Matt Parker, Trace Stone? Yeah, those Little Park guys.
Yeah, they created a company called Deep Voodoo, and they have a company that basically helps them create deepfakes, and they had a movie they were going to release, which was an entire Trump deepfake movie, and like that, you would not think of those guys, if they were doing something like that, as like, you know, something illegal, per se, right? Yeah. But, in this case, if it's a deepfake, suddenly it's different than doing, like, uh, uh, a sketch.
And I think that is an interesting argument to make, that it's not necessarily about the technology that's used, but how it's being used, and, and that is a weird through line in some of this legislation that might be making AI into a boogeyman when the problem is more about the use of how it's being put into play.
And just one more story. Google is investing 1 billion in Thailand to build a data center and accelerate AI growth. So, uh, you know, not a huge policy story, but I think I have found it interesting seeing a lot of these kinds of investments being announced of Microsoft and now Google also investing a lot of money. In foreign data centers are being built in the case in Thailand, but also in other countries like Vietnam.
Uh, and now you must think that probably part of a reason is that they do need to spread out to compute of AI, right? It, it is very, uh, energy consuming to do the AI computation as, as As you do more and more AI tooling, you do need to have more data centers and you do use more energy and so on and so on. So pretty big investment, obviously, to establish a data center and expand their infrastructure for cloud, you know, probably to have that for Asia, right?
And money will spread around the world because these, it's eventually going to be about land, right? So like, it's like, where can I get cheap land? Where can I get cheap labor to help run these things? So like, I think you're going to see probably a lot more of this stuff deployed in different places around the world.
And moving on to synthetic media and art. Just one more story to cover. And this one is kind of a cute and fun one. Maybe super important, but I thought to include it to end on a bit of a lighter note. So this one is about AI reading coach startup. Letting kids create their own stories. So we've launched this feature story time that allows kids to generate personalized stories by choosing from a variety of settings, characters, And plots.
So this AI companion, uh, listens to the child, read along and corrects mis pronunciations and missed words, and offers two reading modes where one, where of the AI and reader take turns and one where the AI does most of the reading. So you can listen to the AI telling your story, or you can also learn to read, do the help of ai. And, uh, yeah, apparently it's already. Pretty popular. They're serving tens of thousands of families and they have 700,000 books read.
They're pricing at $15 a month, uh, with discounts for families receiving government assistance. So, you know, I mean, it's a really
cool thing, like we talked about for a while, that AI was gonna start being a version of a tutor, right? Yeah. And this is kind of a. No friction, fun way to do that. This is not like, you know, send me my math homework, but kids, especially post pandemic, I know somebody, my wife as a, as a writing teacher, she actually teaches writing, uh, creative writing and as a novelist. And one of the things she said is post pandemic, a lot of kids really struggled.
to learn to read younger age kids and like, so whatever you can do to kind of like get kids reading and, you know, parents always read with your kids. If you have kids, obviously we did that. My wife did that a lot when they were young, but. This is just an opportunity to do something other than say Roblox or something else that is a fun way to get them to learn without really feeling that they're learning, which is kind of a cool thing, I think.
Yeah, exactly. I think I've always felt that the kind of one of the coolest things about this AI revolution is the way that it makes really high quality education. Yeah. Very widely accessible, and this is an example of that. And my impression, I don't know too much about how kids are these days. Uh, it's going to be interesting to see, you know, this, this next generation who potentially grow up with AI, right. How they differ from us.
Uh, but, uh, hopefully, yeah, you end up with some cute stories and the kids wind up getting addicted to BD, which
is, the other thing I think that's important with, uh, as a kind of a last note on this AI education story is like. I think deep thinking is the thing we're going to have to teach the kids because I think it's going to be easier and easier not to think deeply. So I hope that education.
The basics like this are really fun and can be done this way, but then teaching becomes more about critical thinking about like deep thinking about how to think about something and kind of pick it apart because the AIs are going to end up doing that for us. And I think people still need to have the ability to kind of work through those things in their head.
So I hope it allows education to shift from more of a kind of rote memorization stuff or basic stuff to that sort of much more complicated reasoning and learning.
Yeah, definitely. And with that, we have gone through all the new stories, uh, ending a bit short of unusual, although that's still an hour and a half for this episode. I know that's a long podcast. Yeah. So thank you for listening as always. We do appreciate it. As always, you can look at the episode description for the links to all the stories or go to lastweekin. ai. Or last week in ai.com, we actually have both. You bought the.com. Oh, nice. We got both. Yeah, yeah, yeah.
And as always, if you somehow aren't subscribed, then do consider doing it. Uh, and do consider reviewing or just leaving a comment or sharing. Always fun to see. Thank you, Gavin, for co-hosting. It was a lot of fun
as usual. Yeah, and you can go find our show AI for Humans on YouTube is really an easy way to do it. Just go to YouTube, search AI for Humans show or AI for Humans. Kevin Pereira, who is my podcast partner that will show up often because he's a little bit more well known than I am, or go to us on any podcast service. We have a fair amount of people just listening to us as well, too. So yeah, check us out AI for Humans.
Yeah. I recommend it. Give it a try. It's a lot of fun. Uh, but do please keep listening to this podcast as well. Of
podcast. These guys are the OGs. I feel like. You've been doing it for what? When did you start? 2000? March
of 2020.
Yeah, you are the OG AI podcasters,
I feel like, or at least one of them. One of them, yeah, exactly. Lex Friedman used to have the AI podcast before it was Lex Friedman. Is that right? Is that what his podcast
was originally? Was AI? Oh, that's interesting. The AI
podcast, yeah. That's hilarious. Anyways, we are done. Enjoy the outro AI song.
Welcome to the future, episode 1A, fine. Let's dive in deep, where the small things thrive. Where AI is the buzz, and the data never seems. That's us, movie gen, it's changing game, taking leaps. Last week in AI, getting down, for numerous rounds. The talk of the streets, in the world of tech, where imagination meets. Step into the now, where tomorrow's just beginning. Gun Chloe Greens and bikes under the some hours for secrets through electric phase in the slop Real where Not race last weekend.
AI s unfolds being planted in cold and star is bold with every everybody innovation speeds. In N world where the future beat. Last weekend AI was. How is the talk of the street?
And the world so tight When I just get an e tape Wonderland, kiss me, serve dinner, buy a tea See you late, text when I go So I still keep my details I can't daft with me We'll be chins this mornin That I went away for, it's the machine on the phone And I'm on the move See you late, text when I go So I still keep my details I can't daft with me We'll be chins this mornin Machine on the blue. Collapse with weak ain't at you. Wisdom nut socks. We're chasing tomorrow. Light up in the kiss your.
Last weekend, AI unfold profit and cold and stories with innovation speeds in the F world. Going still chills. I can die. Me will. Your bitching this morning they died. I know. Is pushing on the phone. Cool. Cool.