🎙️ EP 283: Google’s Wild "Dreambeans" App & Gemma 4 12B Crushes Local Multimodal AI

00:00

it is uh it's pretty strange to think about sometimes yeah it really is your devices are constantly mapping your daily reality but they are also actively trying to reshape it right they record our habits quietly in the background then they subtly try to alter our future behavior and it happens without us even really noticing it welcome to this personalized deep dive we have a uh a truly fascinating journey ahead for you. We really do. We're covering some wild territory today.

00:28

When you see an app crunching your life's data overnight, it seems, you know, kind of quirky. Yeah, it's a fun little consumer toy. Exactly. But you zoom out and you see the bigger picture. The sheer computing power required is driving a massive infrastructure race. A $700 billion race, to be exact. Right. And that massive race is creating a powerful counter -movement. We're seeing a shift toward... entirely private, open source, local models. Bringing all that power

00:57

right back to your own laptop. Two sec silence. So we're going to begin at the extreme consumer edge today. This is where massive tech meets your morning routine. Yeah, right at the breakfast table. Google Labs recently dropped a highly experimental new app. It's currently available for both iOS and Android platforms. The project is officially called Dream Beans. Which, I gotta say, sounds like a complete fever dream. It really

01:21

does sound a bit bizarre. But the technology running behind it is absolutely fascinating. Dream Beans relies heavily on your permission -backed Google data. It curates highly personalized stories every single morning. And we aren't just talking about a couple of alerts. It generates roughly 10 to 14 of these stories. And they aren't just simple text updates either. They're fully animated, narrative -driven stories about your actual life. It is wild. The app crunches your

01:48

massive data footprint while you sleep. Right. It looks across your entire connected Google ecosystem. It checks your calendar, your maps, your search history. Then it distills everything into a concentrated morning drop. Let's look at a specific example from the source material. Okay, yeah. Say you have... Get a puppy on your calendar. The app actively sees that specific calendar event. It then generates a custom animated guide just for you. Right. It shows exactly what

02:13

to expect that first week. It basically creates a customized narrative out of your own schedule. Yeah. And it pulls in relevant tips and weather data. It even integrates your location info seamlessly. It's like having a personal Pixar studio on your phone, just animating your daily to -do list every single morning. Yet is a perfect way to visualize the experience. It makes your mundane daily tasks feel much more engaging. Where did that bizarre name actually come from anyway?

02:39

Product lead Gazda Osner explained the origin quite recently. Okay, what was the reasoning? Well, the dream part of the title is quite literal. The system does all its heavy lifting while you sleep. Ah, so it processes your digital life while you're actually dreaming. Exactly. And what about the beans part of the name? Let me guess. Coffee. You got it. It represents a freshly brewed cup of morning coffee. It's meant to kickstart your early morning with energy. Now, privacy

03:06

is obviously a massive question right here. You're giving an app access to your entire digital life. That requires an enormous amount of personal trust. It definitely requires a massive leap of faith. But according to Google, the data stays localized to you alone. You maintain total control over your own personal information. Right. You can delete your entire history at any given time. Right now, consumer access is still pretty limited, though. Yeah, it's only available to U .S.-based

03:34

users at the moment. And specifically, you need a Google AI Ultra subscription to use it. But there is a public waitlist open right now. Yep. Anyone with a personal Google account can actually join it. You can wait in line to see your life cartoonified. I'm curious about the underlying psychology of this tool, though. What's the actual goal of this quirky little app? Google essentially wants to give you a quick burst of inspiration. The idea is to deeply motivate you with these

03:59

personalized stories. Then you confidently put the phone down to live your life. So a digital push to go live your actual life. Let's move away from personal data curation for a moment. We really need to look at the massive industry consequences here. Yeah, because we rely so incredibly heavily on these intelligence systems now. The shift from personal tools to global infrastructure is staggering. It completely changes how we interact

04:23

with information daily. Anthropic recently shared a very interesting prompt system regarding this. It's specifically designed to keep humans actively thinking alongside AI. They're growing increasingly worried about our collective cognitive habits. Extremely worried. Letting an AI entirely think for you is quite dangerous. It can quietly weaken your own critical judgment over time. I still wrestle with prompt drift myself, honestly. I catch myself letting AI do my deep thinking.

04:53

It honestly happens to everyone who uses these tools regularly. You just slowly start outsourcing your own critical thought processes. Right. And Anthropic desperately wants you to remain actively part of it. Yeah, they want you in the cognitive loop. We're also seeing intense new rules emerging globally. Over in the UK, new regulations are pushing back on Google. Content publishers are finally demanding more control over their work. They're tired of their data being endlessly scraped

05:18

for free. For the first time, they can actually block their content. They can stop it from appearing in AI overviews completely. And they can also block it from AI mode entirely. Right. And block AI -generated search answers as well. It's a massive shift in intellectual property control and digital rights. Speaking of massive shifts, let's talk about Amazon for a second. They are rolling out a very strange new feature right now. Oh, yeah. When you search for items, things

05:43

might look quite different. You might start seeing AI -generated product images first. These appear before you even see the real physical products. They're showing fake product concepts to help you shop. Yes. Fake product images. to help you find real ones. I have to push back on this Amazon feature, honestly. Why look at fake products to find real ones? It sounds weird, I know. It seems completely counterintuitive to the whole shopping experience. If I want a toaster, I want

06:10

a real toaster. I don't want an AI hallucination of a toaster. It really does sound deeply confusing on the surface. But they want to help you visualize abstract lifestyle concepts. They think it helps you narrow down a specific aesthetic faster. Well, Google is also aggressively dealing with fake content right now. They're rolling out AI -powered fake call detection software. Which is becoming a highly crucial security feature right now. It can intelligently check if a trusted

06:37

contact is really calling. It verifies if the familiar voice on the line is authentic. Voice cloning has become a massive global security problem lately. Scammers can perfectly mimic the voices of your loved ones. This defensive feature is now available on Android 14 devices. It runs entirely natively on Android 14 plus devices globally. Doing all of this requires an unbelievable amount of money, though. The financial scale of this specific industry is

07:04

just staggering. Alphabet recently announced plans to raise $80 billion. They're expanding their AI infrastructure as rapidly as humanly possible. They actually expect to spend up to $190 billion. That's just on AI -related capital expenditures for this year alone. The broader tech sector numbers are even crazier to comprehend. Tech giants could collectively invest an estimated $700 billion. That's nearly a trillion dollars

07:32

in raw physical infrastructure. It requires massive data centers, cooling systems, and enormous power grids. With all that power, societal impact becomes a huge concern. Why is Anthropic hiring an AI rule and law team for 345K? They urgently need to study how AI affects our society. Specifically, they're researching the impacts on courts and global elections. Building guardrails before the system breaks our civic institutions. Sponsor. We just talked about Alphabet spending $190 billion.

08:03

What does that kind of astronomical money actually buy? Well, it buys immense computational power and vital research breakthroughs. It completely changes the physical limits of what computers can do. It buys the incredible ability to shrink massive cloud power. You can now fit that power locally on your own laptop. This brings us directly to a major open source release. Google DeepMind just officially dropped the Gemma 412B model. This model handles complex agentic workflows

08:30

entirely locally. You can run it directly on your own personal machine. It's the exact model many developers have been waiting for. It bridges the huge gap between massive clouds and local hardware. DeepMind completely re -engineered how the model actually processes information. They fundamentally changed how it sees and hues the world. Vision and audio now flow directly into the main backbone. They no longer have to

08:55

be awkwardly translated into text first. This drastically cuts latency across the entire computing system. It also reduces the heavy memory usage quite significantly. This is their very first mid -sized model to handle audio natively. This multimodal data processing is absolutely fascinating to me. It's like stacking Lego blocks of data. Yeah, that's a great way to describe it. You connect text, vision, and audio directly together

09:18

without translating them. They simplified the underlying architecture so much to achieve this. raw audio signals project directly into the exact same space they seamlessly occupy the exact same space as text tokens do this allows it to translate and transcribe entirely offline it doesn't need to ping a massive server farm anymore beat. Imagine native raw audio processing running entirely offline on a regular consumer laptop. It's a truly massive leap forward for local computing

09:49

power. The benchmark performance is also shockingly impressive for its size. It performs very close to Google's much larger, heavier models. Specifically, it rivals the massive 26B mixture of experts model. But it still comfortably runs locally on standard consumer laptops. That kind of efficiency was practically unheard of last year. It also features something called MTP drafters built right in. This keeps the generation of complex text incredibly fast. Let's quickly clarify that

10:18

concept for a moment. What are MTP drafters exactly? They fundamentally change how the AI writes its responses. Helpers that guess upcoming words faster to speed up the whole system. Exactly. It predicts multiple tokens at once to save crucial time. It doesn't waste time agonizing over a single word choice anymore. The absolute best part is how accessible this model is. It's completely open under a permissive Apache 2 .0 license.

10:43

the way it's on Hugging Face right now, you can also find them hosted on Kaggle immediately today. You can spin it up instantly in familiar developer tools, tools like Llama, LM Studio, and Llama .cpp. The developer community is already building truly amazing things with it. The open source world moves at an absolutely blistering pace. There is another tool mentioned alongside this release. What exactly is the Gemma Skills Repository mentioned in the source? It's an official developer

11:10

toolkit launching alongside the main model. It helps you build autonomous, multi -step agents right out of the box. Basically a starter kit for building your own offline AI workers. Beat. We're in a highly weird transitional phase right now. The tech industry is spending nearly a trillion dollars collectively. They're rapidly building out massive, highly centralized cloud infrastructure globally. They desperately want to hoard our

11:35

cloud data for morning cartoons. Apps like DreamBeans rely entirely on this massive centralized ecosystem. They need your data living on their servers to function properly. But they're simultaneously moving rapidly in the exact opposite direction. They're giving away the keys to the kingdom at the same time. They're democratizing offline open source power globally with Gemma 4. You can now run incredibly smart agents completely on your laptop. The deep tension between centralized

12:02

cloud and local privacy is fascinating. We're watching. two entirely different philosophies battle for dominance right now. It really makes you wonder about the long -term future of these tools. If models like Gemma 4 get so incredibly good natively, if they can understand our daily reality completely offline, will we eventually abandon cloud -dependent apps like DreamBeans completely? Might we trade them for completely

12:27

private, entirely local AI companions? It's a profoundly important question about who truly controls our data. And it's a question we'll have to answer very soon. Thank you so much for joining us on this deep dive. We really appreciate you exploring these complex ideas alongside us today.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript