🎙️ EP 267: Google’s New "Googlebook" & Mira Murati’s TML Breakthrough

00:00

Think about the last time you spoke to an AI. Yeah. It is usually a pretty jarring feeling. You try to interrupt the machine. An awkward silence stretches out. The system clunks through a messy reset. It just feels completely unnatural. It really does. It breaks the illusion immediately. Now, change that picture entirely. What if your AI could, you know, see you frown? What if it stopped talking mid -sentence and asked, what is wrong? That completely changes the paradigm.

00:25

It does. Welcome to this custom deep dive. We are very glad you are here with us. Today, we are exploring a massive technological shift. We are tracking the move from clunky AI tools to fluid proactive partners. It is a big one. First, we will unpack Google's brand new Android updates. Then we are going to examine the staggering capital engines operating behind the scenes. We are talking about OpenAI, SoftBank, and Anthropic. The scale of the money is just wild. It truly

00:54

is. Finally, we will look at a major breakthrough from Miramarati's Thinking Machines Lab. It is a fundamental shift that promises to end the awkward AI silence forever. It is a profound transition. We are moving past the novelty phase of AI. Yeah. The technology is weaving itself into our daily routines. It is basically becoming an invisible layer of reality. That is exactly what struck me about Google's latest announcements. They just wrapped up the Android show. It was

01:22

their special IO edition. Right. And honestly, the whole ecosystem is shifting. We are looking at a radical redesign of how we interact with software. We are seeing a definitive move away from reactive tools. Consumer tech is becoming a proactive assistant. It anticipates what you need. It doesn't just wait around for a type command anymore. Right. And the clearest example of this is the Google book. They announced this entirely new type of computer. It is built from

01:48

the ground up for Gemini. The operating system architecture is basically inverted. How so? Well, traditionally, the OS manages files and apps. Here, the OS centers entirely on the AI model. The AI is the kernel. Wow. They showcased a feature called the Magic Pointer too. It is essentially an AI -powered cursor. Yeah, that was fascinating. What is really interesting is how it bridges devices. You can use all your mobile apps on the big screen seamlessly. That removes a massive

02:15

layer of friction. And the cursor itself... actually understands context. Right. It uses computer vision. Exactly. It knows what you are hovering over. If you hover over an address, it predicts you want a map. It anticipates your next action before you click. Okay, but the vibe -coded widgets feature is what actually stopped me in my tracks. Oh, man. You just describe a widget you want, maybe something incredibly niche that doesn't exist yet, and Android just builds it for you

02:42

on the spot. Oh, it is wild. It completely flips how we interact with our phone's interface. You don't search an app store anymore? Nope. You just articulate a desire. The operating system handles the creation instantly. It is like magic. Under the hood, a lightweight language model writes the code. It compiles the UI components in real time. But wait, let's unpack that. Isn't that a massive threat to developers? Oh, absolutely.

03:05

If the OS dynamically generates custom widgets, nobody is buying third -party apps for those microtasks anymore. Right? And that is the hidden disruption here. It democratizes software creation for the user. but it absolutely cannibalizes the lower tier of the developer ecosystem. The middleman is eliminated. That makes sense. We are also seeing major upgrades to Gemini intelligence. It is actually taking real -world actions now. Yes. And this is the shift from a passive oracle

03:34

to an active agent. Right. It moves beyond generating text into executing multi -step workflows. The example they showed was brilliant. You take a photo of a concert flyer on a telephone pole. You simply tell Gemini to book a hotel. It goes out, navigates the travel sites autonomously, and finds a room near the venue. Think about the underlying mechanics there. It parses the image using a vision model. Yeah. It extracts the location, the band name and the date. It

04:01

cross references your calendar. It accesses your payment preferences. It is doing all of that in the background. Exactly. Then it executes a complex sequence of web navigation tasks. It acts exactly like a human assistant. It is staggering. And alongside these massive shifts, there are smaller quality of life improvements. Right, like in Gboard. Yes. Gboard now has a feature called Rambler. It scrubs your voice -to -text

04:24

inputs before they ever hit the screen. It automatically removes your awes and mid -sentence corrections. It uses on -device models to filter acoustic garbage. It is so helpful. It makes human communication look much cleaner. It basically edits our natural verbal stumbling in real time. Then there's the pause point feature. This one feels deeply necessary. I love this one. It forces you to wait 10 seconds before opening distracting apps, things like TikTok or Instagram. It gives you a moment to

04:54

decide if you want to read a book instead. It is a great intervention. I have to admit something here. I still wrestle with doom scrolling myself. We all do. Our brains are hardwired for that quick dopamine hit. Yeah. PausePoint acts as a digital speed bump. Yes. It inserts a crucial moment of reflection into a compulsive habit. It fights algorithm -induced addiction with a counter -algorithm. They also announced some practical quick share updates. You can easily

05:18

share files with iPhones via QR codes now. Moving your whole digital life from iOS to Android is getting much smoother. They are actively breaking down the walled gardens. The strategy is to make switching ecosystems entirely painless. Exactly. Lowering the barrier to entry. Security got a fascinating AI upgrade too. There is a new theft detection lock. This is very clever. The AI actually senses when a phone has been snatched out of your hand. It automatically locks the device.

05:49

It also aggressively reduces the number of PI attempts a thief can make. That is a brilliant use of hardware sensors. It uses the phone's accelerometers and gyroscopes. Right. The machine learning model was trained on the physical signature of a theft. It knows the exact G -force and angle of a phone being grabbed and someone running away. Wow. It reacts faster than human reflexes could. And on a lighter note, they fully redesigned

06:13

all 4 ,000 3D emojis. Right, because it is the little things that keep users tethered to an ecosystem. Looking at all this together, the ecosystem shift is vivid. It's like turning your phone from a passive filing cabinet into an active copilot. That is the perfect analogy. The device is no longer just waiting for your manual input. It is observing, predicting, and acting. But this raises a profound question. Does removing all friction with things like Gemini intelligence

06:40

make us lose our own cognitive maps? That is a very valid concern. If the AI does everything, do we forget how to do things ourselves? That is the classic dilemma of cognitive offloading. Every new tool fundamentally changes our baseline expectations. Like when we got GPS. Exactly. When we offloaded navigation to GPS, our spatial memory demonstrably weakened. But it freed up mental bandwidth for other things. Right. We are outsourcing routine digital logistics to

07:08

AI now. We might actually forget how to manually navigate a travel site or build a spreadsheet. But the theory is we gain time and energy for higher level thinking. It is a permanent tradeoff. So we trade a little self -reliance for ultimate daily efficiency. Got it. We're redefining what we consider a worthwhile use of our human time. If we are outsourcing our daily logistics to these seamless assistants, it takes a terrifying amount of server power to keep that illusion

07:33

alive without lagging. Oh, the compute power required is mind boggling. Which brings us to the actual physical engines running all this. The numbers we are seeing are massive. The scale of capital deployment right now is unprecedented. We are watching a historical shift in global resource allocation. Let us look at OpenAI. They just had a massive liquidity event last fall. It was valued at $6 .6 billion. It is hard to even fathom that number. Roughly 600 employees

08:02

cashed out. Around 75 of those employees walked away with over $30 million each. That fundamentally changes the internal dynamics of a company. You suddenly have a large group of incredibly wealthy engineers. Yeah. It shifts the culture from a scrappy research lab to a legacy institution managing massive wealth. It certainly changes the stakes. But the infrastructure spending happening outside the company is even bigger. Way bigger.

08:27

SoftBank is making aggressive moves. Their CEO is discussing a $100 billion AI investment in France. That is an astonishing figure. It is literally larger than the G. It includes building massive new AI data centers. They want to radically expand the country's computing power infrastructure. This is deeply tied to national AI sovereignty. Countries are realizing that compute power is the new oil. If you don't control your own compute, you are at the mercy of foreign tech giants.

08:59

Ain't true. SoftBank is positioning itself as the primary financier of this new global infrastructure. They are pouring the concrete. The corporate drama behind the scenes is just as intense as the spending. Sam Altman recently testified under oath. That testimony was wild. He shared some fascinating and honestly bizarre details about his relationship with Elon Musk. The history between those two founders is incredibly complicated. It is the defining origin story of modern AI.

09:27

Altman testified that Musk once suggested open AI could pass to his children if he died. I still cannot believe he said that. Altman called it one of the hair -raising moments of their relationship. It highlights the philosophical clashes they had early on. They were just arguing over a software product. They were arguing over who should legally control artificial general intelligence. It sounds like science fiction, but to them, the stakes were very real. Meanwhile, Anthropic is throwing

09:51

its own weight around. They are hiring a literal Claude evangelist. The job pays up to $315 ,000 a year. They want someone specifically to help founders and VCs adopt their AI products. It shows how critical developer adoption has become. The best model doesn't win automatically. You have to fight a ground war for mindshare among the people building the apps. Anthropic is also flexing its capital in acquisitions. reportedly in late -stage talks to buy a startup called

10:22

Stainless. This is a huge deal. It builds crucial developer tools. OpenAI and Google actually already use Stainless. And Throdnick might buy it for over $300 million. That is a brilliant, aggressive chess move. If Anthropic owns the underlying tools that their biggest rivals rely on, they gain a massive structural advantage. It is about owning the plumbing of the entire ecosystem. Speaking of advantages, OpenAI is fighting back with new technology. They just released something

10:50

called Daybreak. This is their new security play. It is their direct answer to anthropic -sclawed mythos. Security is rapidly becoming the next major battleground for these models. Daybreak combines two distinct systems. It uses GPT -5 .5 cyber and codex security, an AI guard dog catching software bugs before hackers do. The AI writes the code. But now it also audits the code autonomously. Which is crazy to think about.

11:18

It runs adversarial attacks against itself. It hardens critical systems against zero -day exploits at machine speed. There are also some incredibly fun details emerging from all this coding. The team behind Cloud Code shared a strange tip. Oh, the markdown thing. Yes. They said developers should avoid using markdown files for instructions. Funny enough, they called the tip unreasonably effective. Sometimes the behavior of these massive models defines logical explanation. It really

11:44

does. It is likely a quirky artifact hidden deep in the training data. The model just pays better attention to raw text. And everyday people are using these massive models for entirely different, hilarious things. Like what? A fan used AI video tools to insert himself into the movie Titanic. He somehow fixed every single fan complaint. in a single video. He saved Jack. Right, and he basically became the hero the movie needed.

12:11

Exactly. It shows the incredible, chaotic, creative potential of these tools once they hit the mainstream. But looking at the big picture, a massive question remains. With SoftBank dropping $100 billion and OpenAI minting millionaires overnight, are we building foundational plumbing or is this just an unsustainable arms race? It certainly looks like an arms race from the outside. The valuations are undeniably astronomical. Right. But we have to look at what they're actually

12:35

building with that cash. The data centers are real. The power grids are real. They are physical assets. Exactly. They are pouring actual concrete and laying thousands of miles of fiber optics. Even if an economic bubble bursts, that physical infrastructure remains. It will power the next generation of software, regardless of which specific company wins the AI war. Massive capital is literally laying down the physical plumbing for the future. Makes sense. Yes. It is very similar to the telecom

13:06

boom of the late 90s. How so? The companies went bust, but the fiber they laid gave us the modern Internet. Wait, I have to push back there. Telecom laid neutral fiber that anyone could use. These AI data centers are highly proprietary walled gardens. SoftBank isn't building a public park. Isn't that fundamentally different? That is a very sharp distinction. You are right. Yeah. The physical hardware exists, but the access is strippedly gated by the corporate giants.

13:33

It is infrastructure, but it is privatized infrastructure. Sponsor. The billions of dollars we just discussed are all chasing one holy grail. They all want to make computers feel completely, seamlessly human. That is the ultimate goal. This leads perfectly into Mira Marotti's new venture. It's perhaps the most exciting technical development we are covering today. It changes the core interaction paradigm completely. Maradi is the former CTO of OpenAI. She recently founded Thinking Machines

14:01

Lab. We will call it TML. Right. They just dropped a research preview that is stunning. It focuses on the one thing we actually hate about talking to AI. The awkward silence. Yeah. The fundamental lack of human conversational rhythm. Exactly. They are trying to end the chat bubble interface forever. TML built a system that collaborates in a live, continuous streaming loop. More waiting. It processes your voice, video, and text simultaneously. This is a radical departure from turn -based

14:30

interactions. Standard AI operates like a walkie -talkie. Yeah. You speak, you wait. It speaks. With TML, you no longer wait for the model to finish generating a response. The model perceives and processes data in tiny rapid bursts. It operates in chunks of just 200 milliseconds at a time. That is incredibly fast. It is. This allows it to make natural human noises. It says, mm -hmm, yeah, and got it while you are actively speaking. Human conversation is incredibly complex sociologically.

14:57

We rely heavily on those subtle back -channel signals. They are so important. They let us know the other person is tracking our thoughts. Standard AI lacks this entirely, which is why it feels dead. It even reacts to your facial expressions before you finish your sentence. Wow. If you look confused mid -thought, the AI stops and clarifies its point. That requires incredibly low system latency. 200 milliseconds is the magic number. Why that specific number? It matches

15:23

human conversational gap times perfectly. If it takes longer than that, our brains register it as an awkward pause. To stay that impossibly fast... TML uses a unique two layer architecture. This is a brilliant engineering solution to a very hard physics problem. Because you cannot run a massive intelligence model in 200 milliseconds. Exactly. You just can't. So they split it up. They use a live model to handle immediate superficial interaction. Then a background model does the

15:52

heavy intellectual lifting. One brain for quick reactions, another brain for deep thinking. Exactly. It is like the human amygdala versus the prefrontal cortex. That makes sense. The live model manages the social dynamics and the hmms. The background model pulls the factual data and structures the complex argument. They work in tandem seamlessly. Because it processes video in real time, the real -world capabilities are wild. The model can literally watch you exercise through your

16:22

phone camera. Oh, yeah. It will count your workout reps out loud. It acts as an interactive physical coach. It sees your posture, analyzes your form, and corrects it instantly. It can also translate live speech directly from a TV screen. It can speak up proactively if it sees something change in your environment. Right, like telling you your coffee has got to stop. It has genuine situational awareness. That is a massive leap forward from a text box. Think about standard AI voice modes

16:48

today. If you have ever tried to interrupt one, you know how clunky it feels. Oh, it is terrible. It refuses to stop or it glitches. It breaks the illusion of intelligence immediately. TML is placing a massive contrarian bet here. They believe the future of AI isn't just about raw, escalating intelligence. No, it is deeply about latency, empathy, and conversational flow. Whoa. Beat it. Imagine scaling those 200 millisecond

17:15

reactions to a billion queries. It requires an entirely different approach to server architecture. That is why the two -layer system is so crucial. Yeah. They are basically rewriting how data moves. But I have to ask a strategic question. If TML is betting everything on latency and flow rather than just raw smarts, can they actually survive as a standalone company? That is the million dollar question. Or is this just a preview of a feature that Google or OpenAI will eventually

17:42

absorb? That is the ultimate Silicon Valley dilemma right now. Interaction models do create a powerful user mode. True. If users fall in love with that fluidity, they will not want to go back to clunky walkie -talkie AI. However, tech giants have immense distribution power. Right. Google already has the phones. Exactly. Google can push a software update to billions of Android phones overnight. TML has to build their user base from absolute

18:07

scratch. That is a tough climb. They are pioneering the interface, but the giants are watching closely, and they have infinite capital to copy it. They either redefine the whole industry or get swallowed by a giant. fascinating it will be a thrilling narrative to watch unfold over the next year let us step back and look at the big picture how do all these disparate pieces fit together today we can trace A very clear through line

18:30

across all these stories. AI is fundamentally moving from reactive text boxes to omnipresent real time partners. We see it in Google Androids, proactive widgets building themselves. We see it in the massive physical infrastructure being built globally to support these complex agents. Right. And we see it most clearly in TML's 200 millisecond conversational bursts. The new metric for AI isn't just how smart it is on a test. Exactly. The metric is. how seamlessly it flows

18:59

with human latency. It is about emotional intelligence and frictionless interaction. It is a profound shift in our relationship with technology. I want to leave you with a final thought to mull over. Two sec silence. We talked heavily about ending the awkward silence today. We talked about fluid, proactive interactions. If an AI can read

19:16

your micro expressions perfectly. and interrupt you with the precise level of empathy in just 200 milliseconds, at what point does talking to a machine become more comforting than talking to a human? Wow. Thank you for joining us on this deep dive. We will see you next time. U2RO Music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript