🎙️ EP 59: The Smartest AI Just Dropped (But Can You Afford It?)

00:00

Okay, imagine this for a second. 96%. That's the figure for how much AI information, you know, when it cites its sources, comes straight from corporate blogs, PR, or journalism. Wow. Not ads, not random corners of the web, but really specific sort of curated content. Kind of makes you rethink where KI gets its smarts, doesn't it? And maybe even how you'd create your own content. It really does put those AI report cards

00:25

in a different perspective. It's like finding out your super smart friend basically just reads very specific magazines and company websites. He wouldn't guess AI had such particular tastes. Welcome, listener, to this deep dive. We're digging into the latest in AI, pulling directly from a recent newsletter that was just packed with data. Our mission today is pretty simple. Unpack it all, find those surprising bits, and basically give you a shortcut to getting up to speed. That's

00:51

the plan. We're going to pull back the curtain on which AI models are actually, you know, winning right now. We'll look at practical new features you can use. And yeah, we'll definitely get into that fascinating study on where these digital brains really get their info from. Prepare for a few aha moments, I think. So today, first up, we'll look at the actual performance data. Which models are winning specific races like coding or math or image generation? It's often not quite

01:21

what the headlines shout about. Then we'll hit some highlights from today in AI, quick hits on new features, some policy stuff that's making waves, and even where the big money is moving in the AI world. And finally, we'll really explore that new study on AI sourcing. Seriously, it might just shift how you think about putting content out there online. Okay, let's dive into this first part. For a while, the AI scene felt a bit like, well, a popularity contest maybe.

01:44

But now we're getting solid data, actual benchmarks showing who's best at what. Exactly. And it's not just changing fast. It's getting super specialized. You've got Croc 4 jumping onto the LM Arena leaderboard, making noise, especially with math and reasoning. But then, almost under the radar, Gemini 2 .5 Pro is just sweeping up wins in category after category, kind of like the quiet horse that keeps winning. And if we look at specific tasks, yeah,

02:08

the winners get really clear. For general tech stuff, Gemini 2 .5 Pro is basically top dog right now on that leaderboard. Yeah. ChatGPT 4 .0 and its earlier version, 0 .3. they're tied for a solid second place. Grok 4, which has had a lot of buzz, is actually tied for third with GPT 4 .5. So it's good, definitely, but not totally dominating text tasks. And what's fascinating, like you said, is that specialization. Take coding there. It's a three -way tie right at the top.

02:36

You've got Gemini 2 .5 Pro, DeepSeek R1, and Clawed Opus 4, all tied, and Grok 4. For coding, it's way down in 14th place. That's a huge difference from its overall intelligence score. It really hammers home the point. Just being smart overall doesn't mean an AI is great at everything. That's a really key point. But then Grok 4 flips the script for math. It's the undisputed champ there. Gemini and Clawed are right behind it, but Grok takes the crown. So it's clearly tuned for numbers.

03:03

For complex reasoning, Gemini 2 .5 Pro and ChatGPT 4 are leading, and Grok 4 is still strong in third place there. And when you look at, you know, understanding images, Gemini 2 .5 Pro leads again. Then you've got GPT 4 .5, O3, and 4 .0 all tied for second. Okay. And for search tasks, Gemini 2 .5 Pro actually ties with Perplexity Sonar Reasoning Pro. It really shows Google's strength in those visual and search -heavy areas, which, I mean, makes sense, right? That's their

03:29

backyard. Yeah, absolutely. Now, here's where it gets really interesting for me. Image generation and editing. OpenAI has this new model, GPT -Image 1, and it's just completely dominating. It's carved out its own space there doing some incredible things. Oh, I mean, just imagine trying to scale up running a billion queries. You'd want these specialized models, wouldn't you? Each one just nailing its specific job. It means picking the right tool isn't just about hype anymore. It's

03:54

a real strategic choice. Yeah. This really shows AI intelligence isn't one score. It's more like a mosaic, you know, lots of different strength. And we have to talk about the price tag, right? That's the reality check. Grok 4 might score high on intelligence, but it's also $6 per million tokens. And remember, tokens are kind of like the words or word parts the AI processes. That's a premium. Right. If you need the absolute smartest and budget isn't the main issue, Grok 4 is up

04:21

there. But let's say you want like... 95 % of that performance for maybe half the cost. Gemini 2 .5 Pro to 4 Mini look really solid. It boils down to cost efficiency. And if you want the smartest AI for the lowest cost, DeepSeek R1 actually takes that spot. So for you listening, the takeaway here is pretty clear. Choosing an AI model now is less about the overall buzz and way more about matching it to your specific task and, well, your wallet. OK, let's shift gears

04:49

a bit. Let's get into some of the faster moving developments, new features, policy stuff, business news. This is our Today in AI segment. First off, have you seen this Warren Buffett mega prompt floating around? I think I saw something about that. Yeah. Yeah. It claims it can turn chat GPT into like an expert market analyst. Sounds amazing, right? But got to remember, even with the world's best prompt. GPT doesn't have a CFA. It's not a chartered financial analyst. Right.

05:17

It's a tool maybe for analysis, but it's definitely not a replacement for actual human financial expertise. You still need your own judgment. Absolutely. That's such a critical point, especially with money advice. AI can help, but it doesn't replace expertise or critical thinking. Now, something maybe more practical day to day. Chat GPTs. Ghost mode. Oh yeah, the privacy feature. Exactly. If you just want a quick chat, don't want your info saved or used for training, this

05:44

is great. It's free, it's private, and it forgets everything the second you're done. Like a digital etcha sketch for conversations. And speaking of making it more personal, ChatGPT also got moods. You can actually tell it how to act, how to talk, think, its whole vibe. Imagine setting it to cynic if you need someone to poke holes in your ideas late at night. Or maybe Sage if you're looking for something more profound. It makes the AI feel, I don't know, more adaptable.

06:09

And speaking of adaptable or maybe just fast, Google dropped Gemini 2 .5 flashlight. They're calling it their fastest 2 .5 model yet. Yeah. And the cost just... Ten cents per million input tokens. Wow, that's cheap. Yeah, really cheap. So it's aimed at high volume stuff where you need speed and low latency above all else. OK, now shifting to policy, there was a pretty big move in the U .S. recently. Our source material says that Trump signed an order aiming to ban

06:39

woke AI models from federal contracts. The order says models need to be truth seeking and ideologically neutral. And the source specifically notes that XAI might benefit most from this. Just relaying what the source reported here, obviously not taking a stand. Understood. And on the finance side of AI, a big funding round for Gupshup. They just raised $60 million, mostly equity. They're looking to grow in India, Latin America, places like that. They handle a ton of messages,

07:05

right? Yeah, over 120 billion messages a year for like 50 ,000 businesses. And there's even chatter about a potential IPO in the next year and a half, two years. So big money is definitely flowing. Ultimately, what stands out here is AI getting more personal, more adaptable, but also really specialized for certain needs and business cases. Right. Let's keep the pace up. Let's do a quick run through of some interesting new AI tools and other industry buzz that's changing

07:29

the game. Okay. Yeah. seeing this explosion of new AI tools, right? Designed for very specific jobs, making things easier like Bonsai. It claims it can turn any document into an engaging video course in just 15 minutes. 15 minutes, seriously? That's the claim. Think about that for training or education or just explaining something complicated fast. Yeah, that's potentially huge. And then there's Product Fit. This one creates web content, ad stuff, and even gives you a dashboard showing

07:59

user interest in real time. It's all about figuring out your audience and reaching them better, taking out some of the guesswork. And for designers, there's AI Image Maker. It's a free online tool for design and image processing. Sounds pretty powerful. And another one, Turbo Style, lets you tweak styles, swap visuals, try out design ideas for websites. These tools really feel like they're spreading creative power around. Making sophisticated stuff more accessible, yeah. Okay,

08:24

some quick industry hits. VO3. It's apparently making pro -level commercial concepts now, and the descriptions say they're shockingly realistic. Wow. The quality leap in video generation is getting kind of scary, making it harder to tell AI from reality. No kidding. And Anthropic, you know, the company behind Claude, they've quietly hit. Over $4 billion in annual recurring revenue.

08:49

$4 billion. Quietly. Yeah. Apparently leading the B2B AI space, that's a massive number, shows how much businesses want these AI solutions. Definitely. Also, this is interesting. Google's AI is apparently creating zero -click webs. Basically, you get the answer right in the search results. No need to click a link. Right. I've seen that more and more. The source we read suggested this could mean maybe 10 % of users just stopped browsing further because they got their answer right there.

09:16

That's a huge shift for content creators who rely on those clicks. It's like Google's becoming the destination, not just the map. That has enormous implications. And speaking of Google, Google Photos is adding AI features too, letting you remix photos, turn pics into videos. Meanwhile, the talent shuffle continues. Microsoft apparently hired around two dozen people from Google's DeepMind AI lab. A talent war is definitely still hot.

09:40

Yeah. You know, I still wrestle with prompt drift myself sometimes, where the AI just kind of forgets what you asked it initially after a long chat. Trying to keep up with all these new tools and how they keep changing. It's a lot. It'll tell me about it. But the takeaway from all this rapid change is clear, isn't it? The AI landscape is just incredibly dynamic. There are new tools popping up for pretty much every niche you can think of. Sponsor. Okay, let's get to our final

10:04

segment. And this one I think is really fascinating, especially if you create content online or even if you just care about where AI gets its information from. We're going to dive deep into how these models actually source what they tell you. Yeah, this is so crucial. It helps us peek under the hood, understand the knowledge base these powerful tools are built on. Muckrak published this huge study analyzed over a million citations. From ChatGPT, Claude, Gemini. This isn't guesswork,

10:29

it's actual data on what AI reads. And the big headline finding from the study is, well, it's pretty striking. 96 % of the citations in AI Answers come from content that started as corporate communications, PR, or journalism. 96 %? Almost everything cited comes from what we used to call earned media, or maybe company -owned blogs and stuff. Not from ads. Wow. Just think about that. If you produce any kind of information online,

10:55

that directly impacts your strategy. The study even broke down citations by the type of task the user was doing, which gives even more insight. Like, if people were asking for advice, 48 % of citations were from corporate blogs. Almost half. And if you ask for step -by -step instructions, 47 % blogs, 23 % Wikipedia. If you need help with a creative task, 43 % blogs, 26 % journalism. It really underscores the power of good old -fashioned blog content. as an AI knowledge source? Yeah,

11:22

it's surprisingly dominant. When it came to comparing things, 35 % of citations were from aggregators like Wikipedia, 33 % from blogs. Factual lookups, similar split, 33 % aggregators, 33 % blogs. But when it came to current events, journalism was king. 49 % of citations from big news sources, Reuters, AP, Financial Times. So the quick summary, the TLDR. It's pretty clear, right? Blogs rule for casual questions and how -tos. News sources

11:52

dominate for what's happening right now. It paints a really specific picture of how AI consumes and cites information. And what's also really interesting is that the models have different habits. ChatGPT, the study found, cited journalism the most, lots of mainstream news. Claude, on the other hand, tended to lean more on academic papers, government sources, technical documents. Huh. So they have different appetites. Yeah, different preferred feeding grounds, you could

12:15

say. for content creators here feels pretty significant. The old SEO tricks? Probably not enough anymore. The study even mentioned that putting podcasts on YouTube might give you better odds of being cited by AI. Interesting. So format matters too. Yeah, it's not just keywords. It's about how accessible and structured and authoritative your

12:34

content seems to these models. The biggest lesson here, I think, for anyone creating content today is that the format and the original source really matter for whether AI will even find it, let alone cite it. Okay, so let's try and tie this all together. What does it all mean? We've seen today that AI intelligence isn't really one single score, is it? It's much more like a mosaic of different strengths, specialized skills, and

13:00

each often has its own price. You really have to think about your specific needs and budget. And we're getting more ways to interact with AI, too. More nuance, from giving it a mood to trying out these incredibly powerful new tools for pro -level video or images. The speed of it all is just blistering. Yeah. And maybe the most fundamental thing we talked about is getting a clearer picture of where AI's knowledge actually comes from. It's not some kind of magic, right?

13:24

It's built on this huge ocean of human information, but predominantly, it seems, from corporate blogs and established journalism. That's a really big realization about our digital knowledge system. This whole deep dive really shows that, yeah, the tech is powerful, but Understanding the details, the strengths, the weaknesses, even the learning habits, that's the key to actually using it well. It definitely raises a big question, doesn't

13:46

it? Yeah. If AI is learning so much from current content, especially blogs and news, what does that mean for how information lasts over time and how trustworthy it is in this digital age we're building? Yeah, something to think about next time you get an AI answer. Where did that really come from and what does that imply about its context, maybe even its accuracy? We really hope this deep... dives give you some fresh perspective, maybe some useful insights you can act on. You've

14:12

been listening to The Deep Dive. Thanks so much for joining us. We'll be back soon with another one. Out to your music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript