🎙️ EP 147: The Dish‑Doing Robot, The “Secret Sauce” Drop… and GPT Ranked #8?!

00:00

OK, so let's get into this. We're looking at three really different stories about where AI is going right now. We have this new household robot that's designed to be, you know, cute on purpose. Right. Then you have these AI models just battling it out in these secret benchmarks. And on the other end, a group just dropped the entire blueprint for a major language model. True open source. And if you need proof of just how intense all of this is getting, get this.

00:32

Meta is now asking the government for permission to trade electricity. Yeah, just to feed their AI, the stakes are getting incredibly high. Welcome to the deep dive. Our mission today is to unpack these sources for you and really figure out what matters most. We've got three main areas we're going to hit. First up, that dish doing robot and the surprisingly simple cheat code they found for training it. Then we'll get into the latest

00:54

industry headlines. Who's winning the AI race and what does all that energy consumption really mean? And finally, we'll look at a really radical new definition of open source from the AI2 team that could change everything. Let's start in the home, in the kitchen, actually. Yeah, the cute robot. Sunday Robotics. just came out of stealth with their memo robot. And we've all been hearing that promise of a Rosie the robot for, what, decades now? Right. A real helper

01:19

around the house. It's finally starting to feel a little more tangible, and I think it's because they solved the data problem in a really different way. The huge roadblock for home robots has always been the data. Right. Getting good data on how humans actually do things. Exactly. How do you open that specific drawer or pick up a mug without, you know, crushing it? It's all about dexterity. And traditionally that meant using a super expensive

01:45

telop rig. Yeah. Where a skilled operator wears all this motion capture gear to control the robot remotely. It's precise, but it's also incredibly slow and costs a fortune per hour. So they just threw that whole idea out. Completely. They swapped it for the $200 skill capture gloves. $200. Yeah. Real people just put them on in their own homes and do chores. Live their lives. Pretty much. They're clearing tables, folding laundry, loading

02:13

dishwashers, even pulling espresso shots. It captures pure human motor skills, but cheaply and at a massive scale. That's brilliant. You're just crowdsourcing the training data from the 8 billion people who already know how to do the job. And the sources say this method generated 10 million episodes of data. 10 million. That's just, that is a massive shortcut. It is. It just rapidly accelerates how you teach a robot complex home dexterity. So how good is it? Well, the

02:42

results are pretty impressive. Memo can do a full table -to -dishwasher run. It's a sequence of 68 different dexterous moves. And the big question. The big question. Across more than 20 live demos, they recorded zero broken wine glasses. Wow. Okay, that's the real test. That's the real test. That kind of delicate handling tells you the data quality is there. But wait, if the gloves are that cheap, isn't filtering all that messy crowdsourced data? Just an absolute

03:07

nightmare. I mean, couldn't cheap data just lead to cheap Dex Purdy? That's the core challenge for sure. But look at the team. It's a bunch of Stanford PhDs and ex -Tesla FSD engineers. They lived through that whole millions of miles data collection problem. They know how to filter noise. They explicitly said they chose to leverage 8 billion humans instead of millions of miles. OK, that background brings up some skepticism for me, though. Yeah. The FSD team was, let's

03:34

be honest. Famous for overpromising on timelines? Mm -hmm. Does that affect how we should see their late 2026 ship date? It definitely raises an eyebrow, but I think the domain is more constrained here than full self -driving. And their other big choice is maybe just as important. The design. The design. The robot is soft. It's round. It wears a little hat. So cuteness is now a technical

03:59

feature. Exactly. Because it builds trust. If this little non -threatening robot makes a mistake while it's learning, you're more likely to forgive it and let it keep trying. Instead of unplugging it and throwing it in the closet. Right. User empathy actually becomes part of the training loop. So if we boil this down, leveraging crowdsourced human experience. rapidly accelerates complex home dexterity. That's it. It just fundamentally

04:22

changes the robotics timeline. Okay, let's shift from the kitchen counter to the wild world of AI rankings. Yeah, it's a high -stakes race. And these benchmarks, they used to be pretty stable. Not anymore. There's a new one, the Humane Benchmark, that just sent out some shockwaves. It's designed to test really modern stuff like advanced reasoning, handling images and text, and safety. And the big surprise? GPT came in

04:46

at number eight. Number eight. I mean, that is shockingly low for a model that basically felt like the default for so long. If GPT is at eight, who's at the top? Top spot went to Gemini 2 .5 Pro. After that's Tight Race. You've got Deep Seek, Mistroll, and then get this Grok 4 and Grok 3 took spots 4 and 5. So the performance gap is just closing incredibly fast. Faster than ever. And these technical leaps are leading to some pretty wild spectacle too. You mean the

05:14

Gemini 3 Pro ad on the Las Vegas sphere? Yeah, that thing. People were genuinely arguing online about whether it was real footage or AI generated. Because the quality was just too good to tell. That line is getting very, very blurry. And it's not just visuals, it's speed. We just saw the first fully driverless race cars in Abu Dhabi. I saw that. Hitting 155 miles per hour, racing wheel to wheel. That's all algorithmic decision

05:38

making at insane speeds. And then on the creative side, you have things like Google's Nano Banana Pro. Which is just doing crazy things with images. Fixing specific text inside a photo or blending, what, 14 different images together seamlessly. It's just... It's hard to even keep up. It really is. I have to say, even with all these powerful new tools, I still wrestle with prompt drift myself. Oh, really? Yeah. I'll find a model, gives me perfect output for like the first week

06:07

it's out. Then a month later, the consistency just degrades and I'm back to reengineering my prompts all over again. Getting reliable output is still a huge challenge. That inconsistency makes the whole infrastructure problem even worse, doesn't it? We're talking about massive amounts of energy to run these things. Which brings us back to meta. Bloomberg is reporting that because the grid is struggling to keep up with their AI hunger, they've asked the feds for permission

06:32

to trade electricity. They're trying to become an energy company. Essentially, yeah. A utility company just to service their own data centers. They're so power hungry, they're trying to shape national energy strategy just to keep the lights on for their models. Wow. So if we connect the dots here, the hidden cost of scaling these huge models is that AI is now powerful enough to dictate national energy infrastructure strategy. It changes the entire planning horizon for the energy sector.

07:00

We hear the term open source. A lot in AI. And for a while now, that's usually just meant one thing, sharing the model weights. Right. And we should probably define that. The weights are just the final set of numbers that make the model work. It's the finished product, but not the recipe. Exactly. So if you're a researcher trying to figure out why a model is biased, just having the weights is not enough. It's a black box. You can use it, but you can't truly understand

07:26

it. Which is why this new release from the AI2 team, OMO3, is such a big deal. They've... They've radically redefined the term. They're providing everything, the full training data, every line of code, every training checkpoint. The checkpoints are like saved games, right? The model at different stages of learning. Exactly. And every decision they made along the way. It's a full -on anti -black box effort. It's all about traceability. That is a level of transparency we just haven't

07:53

seen. It's like, imagine if Tesla didn't just give you the car blueprints, but handed over the entire assembly line. Okay. The notes from the engineers, the faulty parts they threw out. Every software tweak, it completely opens up the science behind it. Let's talk about the actual models. They released Olmo 3Think, a 32 billion parameter model. And that one is focused specifically on chain of thought reasoning, so it's perfect for researchers. And you have the base models,

08:18

a 7B and a 32B. And these have a huge 65 ,000 context window. That's 16 times bigger than the last version, so they can handle enormous documents for coding, math, comprehension. And there's also a smaller 7B instruct model, right? Tuned for chat and using tools, which is great for running locally. Yeah, but what's really revolutionary is the cost efficiency here. Right. The sources say the 32B model rivals the performance of something like Quinn. But it used six times fewer training

08:49

tokens to get there. Six times. That's an incredible reduction in resources. Which means the training cost was only about two point two million dollars. That's that's. astonishingly cheap for a model this capable. It's like building a high -performance rocket on a motorcycle budget. It just shows what smart data curation can do. Whoa. I mean, imagine scaling that kind of full traceability to a billion queries, knowing exactly why you got a certain output, not just guessing. And

09:16

it's all out there now. AI2, Playground, Hugging Face, you can run it locally. And it's all under the Apache 2 .0 license. That permissive license is key. It is. But that ability to trace a response all the way back to the original training recipe? That's the unique part. Researchers can now dig in and see exactly where a bias comes from or

09:36

why the model learned a certain thing. So if you look past just the low cost, the single biggest benefit here is that complete transparency allows for critical investigation and much faster model improvement. It just eliminates all the guesswork. So we've really covered three huge shifts today. We have, first, robotics found a way to achieve complex dexterity, not with expensive... gear, but with a clever, cheap, crowdsourced data trick,

10:03

the $200 gloves. Right. Then second, the AI performance race is shifting the whole landscape and putting this just unbelievable strain on our energy grid. Meta trying to become an energy trader is the perfect example of that. And finally, the very definition of open source has been expanded in a radical way. Yeah, showing that efficiency and transparency can actually rival sheer scale. These three things, dexterity, strain, and transparency,

10:27

they're all colliding right now. And that collision is going to define the next few years in tech. That's the big takeaway. So here's a thought to leave you with. If these highly efficient, fully traceable models like OMO3 become the standard models that cost just a couple million dollars to train and give you total insight, how quickly do the old proprietary black box AIs become obsolete? The ones that demand meta -sized power grids start to look really expensive and you can't

10:56

even see inside them. something for you to consider. That's a really important question for where this whole industry is headed. Thank you for joining us for this deep dive. We look forward to diving into the next stack of sources with you.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript