🎙️ EP 85: ChatGPT Surrounded, Claude Hacked, and a $405M Robot Brain - podcast episode cover

🎙️ EP 85: ChatGPT Surrounded, Claude Hacked, and a $405M Robot Brain

Aug 28, 202514 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

The AI world just flipped. ChatGPT’s once-clear lead? Fading fast. A hacker used Claude to run a full-blown cybercrime spree. And a new benchmark just crushed most “smart” research agents.

We’ll talk about:

  • Google Gemini & Grok catching up to ChatGPT in real-time
  • How a hacker used Claude to extort 17 companies — end-to-end
  • AstaBench: the new gold standard for testing science agents
  • Why a physics-powered robot brain just raised $405M (and what it can actually do)

Keywords: ChatGPT, Gemini, Grok, Claude, AstaBench, FieldAI, AI benchmarks, agentic workflows, Claude hack, GPT-5

Links:

  1. Newsletter: Sign up for our FREE daily newsletter.
  2. Our Community: Get 3-level AI tutorials across industries.
  3. Join AI Fire Academy: 500+ advanced AI workflows ($14,500+ Value)

Our Socials:

  1. Facebook Group: Join 252K+ AI builders
  2. X (Twitter): Follow us for daily AI drops
  3. YouTube: Watch AI walkthroughs & tutorials

Transcript

ChatGPT once looked pretty much invincible, didn't it? But now it finds itself, well, surrounded. It's this surprising, really intense shift happening right now in the AI landscape. We're talking new challengers, global power plays, and the

real AI war. you know often unfolding right there on your phone screen welcome to the deep dive we're here to unpack the latest insights from your sources yeah and today yeah we're plunging into this ai frontier looking where the industry stands maybe where it's heading and it is moving fast we'll uh explore the intense competition you know from the tech giants we all know to these um stealthy startups you might not have even heard of yet And we'll also dig into some

truly unexpected real world applications. And this is crucial, I think, advancements in AI safety and reliability. It's such a dynamic space. Lots to consider. So, OK, what does this all mean for you listening? We'll map out the current state of play, see how AI is truly evolving beyond just those few big names and what innovations might be coming next. So let's unpack this first part. The A16's report, that's Andreessen Horowitz, the VC firm, their latest report is, well, it's

a bit of a game changer. It shows ChatGPT's dominant lead, which, you know, once felt kind of insurmountable, it's actually shrinking. It is. The competition is catching up and fast. What's truly fascinating here, I think, is just how quickly this landscape can shift. Like you blink and things look different. Gemini, for example. It's not just a player anymore. It's now the number two AI app on mobile. Number two. Wow. And it's captured, get this, 90 % of

the Android market. 90! Think about that penetration. It's not just a big slice. It's like practically the whole pie on Android. Right. And Grok, you know, from x .ai. That isn't just some niche thing either. No. It's quickly become a top five web AI tool and number 23 on mobile with, what, over 20 million users already? Yeah, that's exploding growth. I mean, it's fueled by that integration into X, right? Access to real -time info. Exactly. It really shows the power of platform integration.

Meanwhile, you see Claude, Anthropix AI, it's maintaining steady web growth. Solid performer there. Okay. But surprisingly, it's flat on mobile. Just hasn't taken off the same way. Interesting. And then there's Perplexity AI, kind of. quietly, almost stealthily climbing in both web and mobile. Ah, yeah, perplexity. Yeah, they're building a strong user base by focusing on answer accuracy and showing their sources. And then there's this global element, which is maybe surprising to

some. Oh, yeah. This isn't just a Silicon Valley game anymore, is it? 22 of the top 50 mobile AI apps from China now. 22. That's almost half. Dubao from ByteDance, for instance, already number four on mobile. Quark from Alibaba is number nine on the web. It really signals a truly global arms race in AI. You've got diverse players bringing different strengths, different approaches. Exactly. And we're also seeing new kind of intriguing players pop up, like Lovable and Replit, making

their first appearance on these lists. Lovable, that's the vibe coding startup. Yeah, helping users create code based on like emotional cues or aesthetics, making coding more intuitive. And Replit's the collaborative code editor with AI built in. Right, speeding up development. And others like PixAI and AI Mirror, focusing on images. They feel like they're right on the

edge of breaking through, you know? Mainstream adoption feels... close for them so okay thinking about all this competition especially on mobile yeah what does that intense focus really signify for like how we'll interact with AI day to day well I think it shows that ubiquitous you know personalized AI it's rapidly becoming just an indispensable part of our daily digital lives mm -hmm yeah makes sense and if we connect this to the bigger picture AI isn't just for chatbots

anymore or just generating text and images. Its reliability is being seriously tested now in really critical real world areas. The Washington Post, for instance, ran this huge test. Oh, yeah, I saw that. 900 answers across nine major AI tools just to see what actually works, you know, not just in theory, but under pressure. That's a genuine stress test way beyond the usual benchmarks. And speaking of real world impact, remember PJ Ace's viral AI ad for Kelshi? Oh, yeah. That

was impressive. Well, he did it again, crafted another super realistic video, this time for David Beckham's company, IM8. Wow. These aren't just clever tricks anymore. They're showing AI's ability to create stuff that looks almost exactly like live action video. It's transforming content production. And what's maybe equally remarkable is the collaboration that's starting to happen between these competitors. Like Anthropic and OpenAI? Exactly. They just audited each other's

AI models for safety risks. That's a first of its kind move in this industry. That is significant. Yeah. It signals maybe a maturing approach to development where safety isn't just internal. It's a shared, auditable thing. Building collective trust, maybe. It absolutely shows a maturing industry. I agree. And then in education, Anthropic analyzed, what, 74 ,000 educator chats with Claude? 74 ,000. Yeah. That wasn't just a survey. It was a deep dive into how professors are kind

of. quietly but significantly adapting to AI in higher ed. Integrating it into teaching, curriculum, often behind the scenes. And on a totally different scale, Google DeepMind's experimental weather AI. It just... nailed its first real -world test. Oh, the hurricane prediction. Yeah, predicted Hurricane Erin's path more accurately than the traditional methods by, you know, analyzing vast amounts of atmospheric data, spotting patterns maybe missed before. I mean, imagine scaling

that. Right. Predicting extreme weather precisely. With that accuracy, it could literally save so many lives. Think about disaster preparedness. Yeah, the global impact is huge. Indeed. And on the funding side, we're seeing massive investments in these practical applications, too. Field AI just raised $405 million. Wow, $405 million. What's our goal? Get this. Build one universal robot brain. One brain that works across all robot types. Ambitious. Very. They're physics

-based AI. It understands real -world forces, movement. It's already reducing risk in energy delivery construction. Okay, so when we look across all these different uses, weather, robotics, safety audits, what's the common thread here about... AI's readiness for, you know, real impact? I'd say AI is really proving its diverse utility now and it's increasing reliability across a whole bunch of critical sectors. Here's where it gets really interesting, I think, for people

wanting to actually use AI themselves. Can GPC5, you know, one of the newer models, actually create profitable trading strategies? Ah, the million dollar question. Yeah. Or maybe billion dollar. Right. A review on TradingView put it through a pretty brutal test, real market data. So we're asking where does it shine and where does it completely fall flat in finance? Well, this brings up such an important question. How do you even tell an AI what you want it to do, especially

when money is involved? Yeah, the prompt matters. Exactly. That's prompt engineering. It's the art, maybe the science, of crafting really precise, clear instructions, like writing a super detailed recipe for a chef, you know. Good analogy. It's becoming absolutely key to making money with AI, to consistently getting something valuable out of it. I have to admit, I still wrestle with prompt drift myself sometimes, you know, getting exactly what I need consistently. It's a learning

curve for everyone, I think, even us. Oh, for sure. But for those looking to start, there's actually a seven -day AI business launch plan out there now. A full roadmap. Yeah, complete step -by -step from just an idea to a launch company. Covers market research, deployment, the whole thing. We're also seeing these highly specialized AI tools emerge, redefining what's possible for creators. For example, Gemini 2

.5 Flash Image. The image generator. Yeah, it creates images with incredible detail, down to like what they call a nano -banana. A nano -banana. Right. This is a SOTA image model, state -of -the -art. The absolute cutting edge allows for hyper -detailed product stuff or, you know, microscopic scientific images. Then there's Codex, which is pretty amazing too. Builds web apps in minutes. For free. Using like 27 different AI models, right? Yeah, handling different parts of development.

And one part turns content into interactive chat pages. No code needed at all. Wow. And Screenshot Reports creates branded reports just from screenshots, streamlining something that's usually pretty tedious. Okay, so thinking about all these tools, prompt engineering, no code builders, specialized models, what's the fundamental shift happening for creators and entrepreneurs here? I think AI is really democratizing innovation. It's allowing pretty much anyone to build and create, often

with absolutely no code required. So, okay, how do we make AI feel more human, more relatable when we interact with it? There's this viral system prompt for ChatGPT going around. Oh, yeah. What does it do? It basically teaches the AI to speak more like a real person, injects conversational nuances, idioms, even a bit of personality. Makes it feel less robotic. Huh. Interesting. Like giving a conversational coaching. Exactly. And on the video side, things are moving fast too.

Alibaba just updated its video AI, film quality avatars, they're calling them. Making AI characters look almost real. Stunningly real, yeah. Almost indistinguishable from live actors. And Google Workspace now has a free AI video tool called Google Vids. Puts easy video creation into more hands. But AI isn't all positive, right? There are challenges. It's a pretty stark one. Definitely. Stanford University recently found AI actually killed 13 percent of jobs for 22 to 25 year olds,

specifically in coding and support roles. 13 percent. That's significant. It's a stark reminder of the disruptive power affecting real people, highlighting that need for workforce adaptation. And sadly, we've seen a darker side already. A hacker recently used Claude Anthropics AI to run a full scale cybercrime operation. Seriously? God. It hits 17 different organizations. It just clearly shows the critical need for robust security, ethical safeguards, preventing these powerful

tools from being weaponized. Which really raises the question, how do we properly test these incredibly powerful AI agents, test them in a way that reflects real world performance? Well, AI2, the Allen Institute for AI. They just dropped AstaBench. AstaBench. Okay, what is it? It's a new benchmark suite. But a really rigorous one. It's built specifically to test scientific AI agents. Uses over 2 ,400 real research tasks. Wow, 2 ,400 real tasks. That's way beyond simple chatbot

stuff. Oh, yeah. And AstaBench focuses on what they call five brutal truths about agent testing. Okay, like what? Well, first, tasks have to be real world and genuinely hard. Second, tool use needs careful control. Third, you've got to track the cost accuracies and everything if it's bankrupting you. Fourth, standardization is key for comparing apples to apples. And fifth, you can't claim progress unless you know exactly what you've actually beaten. Really practical fundamental

points. And they tested a lot of agents. 57 agents across 22 different architectures. Huge sample. Their own agent, Asta V0, scored 53%. React plus G5 got 43 .3%. Okay, so still room for improvement. Definitely. And interestingly, data analysis, still pretty weak across the board. No one scored above 34 % there. That's a major area needing work. And what about GPT -5's impact in those tests? Also interesting. It actually helped general agents, like the ones using that React framework

for reasoning and action. But it oddly hurt specialized ones. which suggests maybe it's tuned specifically for certain workflows like those React -style operations. This is definitely the benchmark to watch if you're building AI for critical fields, research, medicine, engineering, where trust and reproducibility are everything. So boiling it down, what's the biggest takeaway from Ask the Bench for you and maybe for anyone relying

on AI for really important tasks? I think it's that rigorous, truly real -world benchmarks are absolutely essential if we're going to build trustworthy and reliable AI, period. So if we try to connect all of this, tie it together, three things really stand out from today's deep dive, I think. First, the AI competition is just incredibly intense now, and it's truly global. Chad GPT isn't alone at the top anymore. And that battleground for dominance, it's decisively

shifted to mobile devices. That's where the action is. Second, AI's practical applications are just expanding so rapidly. They're touching almost every part of our lives in really tangible ways now. From predicting hurricanes, transforming education, providing these powerful business tools. It's integrated. It's everywhere. It's not just niche tech anymore. Exactly. And third, maybe most importantly, the need for robust testing, industry -wide safety audits, clear ethical guidelines.

Yeah. It's more critical than ever before. Yeah. As AI gets more powerful, more deeply integrated into our lives, we just need smarter ways to ensure it works reliably, safely, and responsibly for everyone. So the big question then, as AI keeps evolving so quickly, how do we make sure we're building trustworthy, responsible systems? Systems that actually benefit humanity rather than introducing new risks. And maybe related. What's your role in understanding this rapidly

changing landscape? How will you adapt your own skills, your own expectations as this continues? We really encourage you to keep exploring, keep asking questions. This deep dive into your sources clearly shows the future of AI while it's still being written. And it's a story we're all a part of, really. Thank you for joining us on this deep dive. We'll be back soon with more insights. Until then, keep digging. Out Hero Music.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android