¶ OpenAI's Major Strategic Shift
OpenAI cancelled Sora and they just canceled Spicy Chat. They are fully focused on enterprise, reaching AGI, and a new frontier model called Spud. Yes, is this tater-based intelligence going to get their backs off the perceived wall? And why does anthropics seem to outship them every single day? Plus, new AI music and audio models from Google. And Meta has a new AI that can supposedly
Read our minds. They don't want what's in here, Kevin. They don't want what's up here. I got a strong feeling many of us don't, Kevin. Well, let me tell you something, because I'm thinking about this new ping-pong robot. Have you seen this? It's pretty incredible. I'm gonna take it and I'm gonna remake Marty's. I'm gonna attach Timothy Chalamet and I'm gonna be. Hey, good night, sweet prince. This is AI for humans.
Welcome everybody to AI for humans, your twice-a-week guide to the world of AI. And today, Kevin, we are talking about open AI with their backs up against the wall. The I walked away from my microphone there, their backs up against the wall. They are trying to kind of slim their uh Ozempic their whole company down to try to approach a new frontier model and the race to AGI. Let's get into one.
Actually, that is a perfect idea. That should be. So there's been some big news over the last couple of days about uh open AI.
¶ Sora and Spicy Chat Cancellation
The biggest news really, and we covered this very briefly in our last show, is that they have canceled Sora, the AI video program. And this is a big deal. Like they didn't just like say we're gonna cut the Sora app, which some people thought at first. Did you see the videos of the never AI people dancing in the streets, Gavin? Did you see them all celebrating, drinking Chianti and shooting off streamers? I will say this, I put up a video pretty much right after it happened and I got so many
Yay! responses that was like crazy. But this is a big deal. Not only did they shut down the app, but they've taken out the entire AI video program. They are not gonna lay in AI video anymore into ChatGPT. They have canceled their deal with Disney. They canceled the deal with Disney. A billion dollar deal that much.
Hullabaloo was made over, which I think is a jungle book character. They're also killing the API, which means if you trusted OpenAI and built a business, any sort of application or whatever on the backs of this video model, bye bye, Toodles. Now To those that celebrated in the streets, Gavin, you they were saying, Oh, this is it, that generative AI is starting to die. I think that's missing the forest for the trees. Yes. Usually the same trees that AI is using to power these machines.
Because in its wake, there's a million different models from other providers. It's not like they're killing this. due to lack of interest. I think they're killing this due to lack of focus within. Yes, that's right. And I think that's an important thing. Uh you know, CapCut just today opened a new app that they're going to kind of plug their uh BiteDance's AI video model, C Dance two point oh, into.
There is a probably a new VO coming from uh Google. I would say closer to I.O., maybe in May. But another thing happened here, Kevin, which is just a good kind of overarching story, which is a thing we've been tracking for a while. This was known as
¶ OpenAI's Spud Model and AGI
Citrus mode at one point, they are fully canceling spicy chat. So there are a lot of people out there in the world. I don't I'm not gonna say I'm one of them, but I was interested in what this could do. Who were interested in this idea that Sam Altman had mentioned they were going to allow.
spicy chat in Chat GPT. You had to age gate it. But this feels like another kind of like slimming down of the of the pathway to doing this stuff. I don't think you were necessarily excited about this or I was necessarily excited about this, but this is something they had promised.
And now they're saying, mm, maybe not. We're not gonna do that right now. It feels like to me This is part of their move of saying, like, hey, business people, hey, you know, what you person wearing a collar at your job and maybe a tie. We're here for you. We are not here for the guy in his in his bedroom at two AM trying to get I don't know what.
Do you have a stethoscope or beakers? We want you. You have a Fedora and a Fox mask? Hold on. Uh no disrespect, but uh we need the money actually. I think it's a smart refocusing I Look, I don't understand like first of all, we as humans can change our fetishes. So if you wanna like get off to token talk, Any LLM could be spicy. Any LLM. If you if you treat JSON like ASMR and if you get that joke, welcome. You're fine.
Um it's not wouldn't it be hard for them to just relax the guardrails, Gav, right? To say, Hey, are you over eighteen? All right, let now we're gonna let you have that chat. That to me is is is less the issue. The cap cut thing is interesting. If people are listening to this or watching this, what does that mean for them? Do they have access to
The latest seed dance? Is that it's not out now? It's still not out in certain territories, including the US. And actually we did get a note from our last show that the VPN uh round roundabout we thought was gonna work is not working yet. So they're doing some interesting stuff to gate it. I think C Dance is gonna cut a deal. The biggest thing I keep thinking about AI video is VO and C Dance are the two companies that own the data. And when I say that, what I mean is that Google owns YouTube and
This video will be on YouTube every day. There is some crazy number, like a million hours of new video uploaded to YouTube. C Dance is owned by ByteDance and ByteDance owns TikTok, which is also a platform where they a lot of video getting uploaded. So At some point these companies were gonna kind of run away with it.
I think for open AI, the biggest thing that's going on here is they, as we get discussed last time, are starting to get lapped a little bit by anthropic, especially when it comes to recurring revenue. Like the numbers are pretty close right now. So uh Kev, I think the other thing that's going on here is this is a nice like PR pivot internally to talk about their new model, which they're calling SPUD. Now there is not a lot of information about this. The biggest thing is
It's supposedly going to be released in two weeks, which is a very fast turnaround, right? We talked about how Anthropic is shipping all these features. If open AI delivers a significantly improved new model on top of GPT five point four, which by the way just launched what like three weeks ago. That is a big kind of salvo in this space. And ultimately, people have talked about coding in business as being like kind of the use case that people are featured on right now.
I know this'cause you know this too. Like I spent a lot of time yesterday with with Opus four point six and Codex both trying to squash bugs in the small thing I was working on and I spent four hours trying to find one problem and eventually I did. It would be nice to have that be twenty minutes. So if that can if we're going to that point, that's fantastic. Yeah.
Yeah, I look anecdotally, and we'll talk about this maybe in a little bit when we talk about the the game that I vibe coded uh over the weekend. I think four dot five is the most capable model that's out there right now for coding. There's also uh four dot five mini, which is Very quick and also very capable for the price and the amount of usage that you get, especially when you turn on more thinking. They also had Codex Spark.
Which was like an experimental model running on slightly different hardware that was lightning fast. So I I think they have plenty of tricks up their sleeve. I think their models are fantastic. I think their software around it, their harnesses needs to get A little bit better. And I think they know that as well, which is probably why they're refocusing. So I would certainly not count.
open AI out at all. Yeah. I I don't mind that they're refocusing. I wonder if they release Sora as an open source something or sort of uh release it as like a here you go community. Like sorry we could do something with it. I would be shocked. And here's why. Because I think the tricky thing, if they were really going to release it open source, what that means is that all that stuff in there. Probably.
uh uh is like weapons free. Which when I say that stuff in there, I mean all of the faces, all of the things that I think is all of the model data, all the celebrities, all the everything. So I don't know, that that feels like to me a tricky thing. I do think this is like a kind of speed bump on the way to what you know we think about AGI. We've talked about AGI on the show before this idea of artificial general intelligence.
Sam did say, and I and uh there's not a this is in this kind of uh memo that went out to his team, he said, quote, things are moving faster than many of us expected. And this refers to the model, the Spud model coming out in a few weeks. So I guess that means that we're gonna keep seeing this acceleration that we've noticed in the last little bit. And I guess not surprising. Like it just feels like we're like you mentioned in the last show, we are in this kind of like
¶ Google's Latest AI Innovations
churning mode right now where things are just going faster and faster. I mean, case in point, there are two new Google updates today that feel like, you know, a few a few years ago these would have been like major hurrahs, but like There are new, uh, there's a new audio model from Google that is really specifically important for agentic AI audio interactions, which you and I both know really well. This is Gemini 3.1 Flash Live.
And Kev, it what I'd be interested in, let's play this this uh tweet from Josh Woodward because there's some audio in here and we'll get a sense of like how it's improved versus what we had before. Gemini, I'm at the gym. Give me a quick three step finisher for triceps. A great finisher is triceps push downs with a Attachment. Okay, so Kevin, is that a great finisher? Let me I don't know this, but is that a great finisher? Yeah, that's a totally fine finisher.
Sure, if you wanna polish it off. I mean you got so many people only work one section of the tricep. You really gotta get the uh the bronchialis latera. You have to get in there and you wanna just keep tweaking and twerking until the duh. Oh.
Bleed that part out. I was actually more stunned at like they put in a little audio cue so you knew that the the thing was actually processing what it did, and it seemed like that tiny little audio cue was barely finished before it started talking, and that's really impressive. Yeah, I mean the big thing here is they talk about it being really uh improved in agentic uh tasks and this again it's that same sort of thing. Like you know, you and I have both been using coding agents.
Well, this is the idea of like, could you put a voice so that it could be able to kind of figure out what you want quickly and then give you the answer you're looking for? This is one of the problems sometimes with voice is that like. You know, it doesn't have the time to go through the thinking process because in the thinking process, it's a delay. And if you're doing voice, what you really want is a direct answer.
I think that speed to answer is gonna be a big part of the next like year of of consumer AI, particularly, because I'm okay with waiting for a while for coding answers. But I'm not that okay to wait for a tricep answer when I'm in the gym. General knowledge answers. Yeah. Exactly.
Plus you gotta do your set and clear the machine. No one likes a loiterer, all right? I get my ring light, I get my tripod, I set it down, I do my one right. And I yeah, and then I break it all down. I take my road mic off, I lose I take my GoPro vest off, and I move away. I move away. I'm a fit fluencer. I know how this works. Also, along those lines, did you see the flashlight stuff? The I haven't seen this yet.
So they they have Google uh Gemini three dot one flashlight, L I T E, uh and it's flash, not flash. And they built the flashlight browser to show it off. And they have a video, Gavin, and we'll put it on the screen where it is. This isn't a pre-built website. They're just showing the speed with which this model can render images and and text and format it like into a a website looking presentation.
Someone's clicking around and you would just think that they're on like American broadband. It loads. Right. You know, it's like it's a little slow, but it gets there and Everything that's on the screen, if you're getting the video version, is is being rendered in real time. And literally on Tuesday, we were talking about a future where software is rendered almost like real time video.
This is kinda that. This is this is the beginning of that. Imagine an interface, Gavin, where you're cutting a video and you go like, oh, this isn't what I want to see. Give me a panel that shows me G the special effects options that will add sparklers to my blah blah blah. And okay, let me just render that for you. Here you go.
And also just a side note, Kev, uh Google J DeepMind just released this piece of research called TurboQuant, which actually speeds up access to memory in a much bigger way, which I think is gonna actually make these AI tools way more useful. Yeah, so the the vector memories they're stored in like three D space and usually to retrieve them, imagine someone giving you directions and saying walk
three blocks this way and then turn, you know, make a right at the uh Albertsons and walk five more blocks, right? Then you're good. Now you're at the dispensary. Why why would that be directions I need? Follow me. Instead of that, Gavin, they'll just tell you walk 35 degrees, like position yourself there and just walk for three blocks.
And that's it. And it's like it's like polar coordinates. So it kind of compresses all of this data to get to the vector memory, puts it into these like polar coordinates. But the the point is like in fact memory stocks kind of took a slight dip. uh on the heels of this'cause people are like, Oh, we're gonna need less memory.
Th nope, we all know a very, very popular paradox, which means that the the demand for this stuff is just going to increase, increase, and the expectations are going to increase as well. But this is cool. Like we know that we need a few more Technological breakthroughs to get to AGI and then maybe beyond that, this might very well be one of
Yeah, and it's by the way, it's not the only Google announcement this week. They also announced Lira 3 Pro. So this is their AI music model. You can now generate full songs, which is kind of interesting. Let me give you a soundtrack while you announce. Lyric three background music. Yes. I mean and listen, we said last time we tried Lyra three, it's fine. It's not like the most exciting uh AI music model. We felt like Sunno's kind of advanced, but
We do appreciate the fact that Google continues to build across all of these platforms. I think one thing that's interesting about Google is They have all this other money because they make money from ads on Google search. So they are able to kind of dump a bunch of money into these side projects that a company like OpenAI just don't think can right now.
Yeah. It's one of the first audio models where I don't notice that that like that AI shimmer that's on everything. Like that was the first thing that grabbed me when I heard it. So there's clearly like room to improve. And because I give them twenty dollars a month for my uh Google Voice account. Like yes, they're making money hand over fist everywhere. Yeah, exactly. And it by the way, like Google I'm very excited to see what comes out of Google I. O because that's happening in May.
By the way, Kevin, we are invited and uh I have signed us up, but I know you may not be in town, but we can go to it this year. So we got invited. We got invited? We got invited, yeah. So we are able to show up at it and I will happen to be in the San Francisco Bay Area. Tell him I got a thing.
¶ New AI Models and Video Tools
I wanna play two just let'em know I'll play two. Okay, too cool. Uh one other cool audio note, uh Mixedroll, if you remember Mixtrill is the French Uh, AI company that has been working on open source. This is an open source video model, Kev, and it's very performative. You wanna play just a little taste of the open source mixture model to see what we're getting here? The one the world knows me by. And now It can travel further, not away from me.
But more fully asm sounds very French. It sounds very French. This is a existential French. Anyway, it is open source. You can go use it. It is called uh Voxtrel. Uh voxtrel.tts. And it's worth going to check out. Obviously, all open source models you can play with, download, and you can screw around with on your own. You know, anytime voice gets better, we love talking about it. I do think voice is kind of taking a backseat right now to coding and even video is taking a backseat to coding.
Still many cool things happening at the same time. Is he uh is he digging on the couch? Who, Ollie? Yeah! Ollie, what are you doing? They did talk about they did yeah. Don't dig, Ollie. This is not our place.
Ollie. You dig. It's okay, especially because it's not your place. You go for it, buddy. You get to those. That's my dog. That's my dog, Ollie. Yeah, we should run out quickly to uh uh multi-shaped. Our friends at Runway. Yes, our friends at Runway have released a new app called multi-shot video. This is an idea that's basically the I'm sure this is kind of a video harness, which is interesting. You're basically taking your screen grab or still or whatever.
And you can put a very simple um prompt in for it and you can get a video out. So if you want to look at this, I took uh this week's thumbnail with you and I and a and a lobster being boiled in a pot. And I just said, here, make something interesting where the lobster gets away. So play this. Pretty good. You grab it! I just needed a like a a mildly uh mildly interested French narrator.
To really grab me. We should have put some voxtrel over it. It's funny when you say that because when you look there's one shot where we're both turned to the side and we have much larger noses than we actually do. So maybe they did make us French already, but
Again, if you didn't see, if you're just listening to this, that is, you know, Kevin and I th from the thumbnail, we kinda turn, we see the lobster jumping out of the pot. The lobster jumps out, falls on the ground. It's all very, pretty well done. And I didn't prompt any of the scenes.
So this goes back to that idea of like everybody in the world of AI video is kind of working towards this idea of like how can we simplify putting together shots, right? Because C dance 2.0 is very good at this. Like you can add shots in. For the for the normal person, if they don't want to break it down by shots, this might be a really cool thing. That was one of the big, like out-of-the-box magical things about Sora when it first launched.
a script breakdown behind the scenes to do all the shots. Breaking news gather! Oh no, what do we Real Time Translations just became even easier in the Google Translate app. Now with headphones on iOS because it was already out on Android, on iOS Real time language translation. So all the people that bought the new AirPods to try to get in. You didn't need a real time translation. Apparently you didn't need it. Just wait on Google. So wow. Wow, cool. That's actually pretty interesting.
So that's them doing their they're taking their audio model and they're expanding it out into consumer use cases. That is actually something I very much would use and you will use very soon probably because you're gonna be in a different country. Yeah, that's amazing.
¶ Meta's Brain AI and Robotics
Um, there's another huge story this week that kind of didn't get enough coverage right now. It is a science story, and maybe didn't get enough coverage because Meta's going through it right now. If you didn't follow this, Meta and uh YouTube have both lost a pretty big lawsuit around like social media, but Meta AI has created a new thing called Tribe V2. And what this is, is a basically a way to simulate what humans
Perceive in their brain and they can essentially re- not only read thoughts, but it can also predict thoughts about what you're seeing. This is an AI model that actually. Seize brain firings. Like meaning that like you'll see like if you look at something, it'll show you what part of the brain lights up.
When you look at a thing and it's predicting this stuff. So this is a combination of what's known as FMRI technology, which is like kind of brain scanning technology and AI to be predictive. And Kevin, this a lot of people are saying that this is the step toward brain uploads, which you know, we've talked about science fiction ideas before. But the idea that you could upload a brain, have it simulate what the sort of scenario is as a We know that if we show them this
Cucumber wearing Oakley's graphic. A certain point of his portion of his brain is gonna light up and he's gonna be more motivated to buy. That that is the thing that worries me some ways about meta. It's one of the funny things about meta at large. Like we've talked about how meta is kind of in the background right now on AI. They spent all this money on all these AI researchers. They did bring in Multbook to try to kind of like clearly own the AI agent space, but
Meta has a way of making everything they do about ads, right? And about their business model. Which again, I don't necessarily think from a business perspective that's bad, but maybe from a humanity standpoint, not so good. Well this in this paper the interesting thing was this was done about five years ago but they didn't release it because they thought the tribe model was broken. They were showing all the brain scans and there was nothing lighting up and they realized
They were just testing on users in Horizon Worlds. Oh damn, that was a long walk to get there, but we got there finally. R.I.P. Horizon Worlds. R.I.P. Horizon Worlds. RIP Kevin's face up close to there. And finally, Kevin, we have some really awesome, we have an awesome robot story here. This is called Smash. The world's first high dynamic humanoid robot for outdoor tenn table tennis. Fully autonomous with onboard perception.
If you are just listening to the show, you should definitely come and watch this video because it is fascinating. Basically, what you're seeing here is a robot ping pong player. And ping pong, as many people know from Marty Supreme, is like a pretty difficult thing for humans to do.
Especially because you're tracking a small ball and it's moving fast with weird physics. What are you laughing about? You guys know ping pong from Marty Supreme. You've never probably seen or heard of it before. In the in the most recent world, you understand what pro ping pong is. Yes, yes, of course. But anyway, this is a very cool project. It's one of another sort of sign that robotics is moving along very fast.
And I do think it's an important thing for people to realize there was also a video recently, uh speaking of figure oh three, I don't know if you saw this, but where it was walking with Melania in the White House, which was like just this goofy looking video. When you compare that walk. to the ping pong robot, you just get a sense of how vast the difference is in terms of what people are pulling off right
There was a couple of weeks ago, I don't I maybe we showed it. I don't know, we make so many of these now. I don't know the the the tennis playing unit tree. Like it was fully stumbling around a tennis court and swinging the racket. Like yeah, that that similarly. Impressive as hell. So do you think there's gonna be a unitary tennis pro at the club that like tries to like creep up on the old uh country club ladies? Like he's like
Hey ladies, look at this. I can do a back I can do a back. Get behind you and show you the better grip. Please please stop you the tree. Stop you the tree. Continental continental tennis playing robots. That's great. But the real humans, Gavin, the real humans out there, they're too busy. They're too busy clicking like and subscribe and smashing the bell and leaving a comment to Juice Ar Algo down below. So please do that.
If you listen on any podcast apps, give us a five-star review, uh, leave a comment there. That helps us grow. It's the only way we get any attention. Share us on social, promote us on LinkedIn, hashtag open to work. Also Discord. Uh this can be beta testing the tile matching battle royale game, which is almost almost stable and dare I say almost fun. I don't think I'll get it there.
I think I'll get it there. It's fun. It's fun. It is definitely fun. We are very excited to play more of Kevin's game. Yes, please come visit us in our Discord. Link is below. And we will see you all next week. Bye bye.
