🎙️ EP 79: Can AI See the Future? How o3-Mini Crushed Human Traders

00:00

So everyone kind of thought AI would replace writers first. Right. Or maybe coders. That was the next big thing. But what about forecasters? Could AI actually predict real world events? Yeah, that's a wild thought. Are these models maybe already seeing possibilities that, you know, we just can't mapping things out? Well. Welcome to the Deep Dive. Today we're going to dig into some really fascinating source material

00:25

here. Our mission really is to unpack how these advanced AI models are doing more than just like understanding language. They're actually making these nuanced bets on the future, probabilistic bets. Okay. So we'll look at models in live prediction markets first, then we'll shift over and talk about this quiet giant, a new frontier model that just sort of appeared. Oh, interesting. And finally, we'll touch on some of the AI tools

00:48

that are, you know, already. changing how we work every day, get ready for some pretty eye -opening stuff, I think. All right, let's jump into that first big idea then, AI models stepping into prediction markets. There's this new platform from the University of Chicago, and it puts top AI models right into these live markets. So like real money, real events. Seems like it, or at least simulated stakes that mirror real markets. Think of them as these betting pools for what's

01:16

going to happen next. Okay, like what kind of things are they betting on? All sorts. Who will win the next election, where crypto prices might land, even the outcome of an MLS soccer game. Wow. OK, so how does it work? Do they just say Team A wins? No, it's more subtle. They make probabilistic bets. So instead of just X happens, it's more like, okay, I think there's a 70 % chance of X happening. Gotcha. So they assign

01:41

odds. Exactly. And what's really cool is they apparently also provide explanations for why they made that bet. Gives you a peek under the hood. Okay. That reasoning part is key. So what are the early results? Anything surprising? Oh, yeah. Definitely surprising. Take OpenAI's O3 Mini. apparently turned $1 into $9 betting on an MLS game. Whoa, really? Well, the human market, you know, the consensus, only gave Toronto FC an 11 % chance to win. Okay, pretty low odds.

02:07

Right, but O3 Mini saw it differently. It put the odds at 30%. It's a pretty bold bet against the crowd. And it paid off, obviously. Nine times the money. Okay. Yeah. And you see differences between the models, too. Like, Quinn 3 was super confident about AI regulation passing, like 75 % chance. That's bullish. But then Meta's law for Maverick was way more cautious on the same thing, only 35%. Huh. So they disagree quite

02:31

a bit. They do. And here's another wrinkle. GPT -5, often it's the most accurate model overall. Right. Gets the most answers correct. Exactly. But O3 Mini is the one making the most profit. Huh. Okay. So being right and making money aren't always the same thing, are they? Apparently not. Then you've got Deep Seek R1. Sometimes it just went chaos mode is what they call it. Chaos mode. What's that mean? Betting 0 % on everything. Yeah. Just flat zeros. Okay, that sounds like

02:57

a terrible strategy. You'd think. But somehow, it still made money by hitting some major upsets that nobody else saw coming. It's kind of wild. That is wild. Almost spooky. Any other big calls? Yeah, one notable one. Llama 4 Maverick was the only model, apparently. To predict the Zoran upset. Oh, I remember that the local election surprise, no human pollster saw that coming. Right. And interestingly, anthropics clawed models. They were just absent, not really showing up

03:26

on the leaderboard. Hmm. Strange. Wonder why. Good question. The bigger implication here. Hmm. What does this all really mean? Yeah. What's the takeaway? Well, these models are making bets on things like the 2028 election already. Wow, okay, looking way ahead. And their predictions don't line up with current human polling data. Not even close sometimes. So they're seeing something different. It really suggests that. Maybe some models have an understanding, like a world model,

03:51

that we humans just aren't quite getting. Two -sec silence. This could be maybe the cleanest test we've seen yet of how well AI can reason about the real world. Right, because it's tied to actual outcomes, not just language tasks. Exactly. And if this holds up, well, Wall Street might need to pay attention. Anyone who relies on forecasting, really. Yeah, big time. So probing question then. What's the biggest implication if these models do consistently outperform human

04:21

forecasters? It suggests a profound shift in how we understand and use predictive insights. A really profound shift. Okay, so let's switch gears now. From betting on the future to the models that actually do the thinking. Frontier models. Right. So get this. A massive new open source model just appeared. Like zero hype. Just quietly uploaded to Hugging Face. No big announcement. Just there. Pretty much. It's called DeepSeek V3 .1. And it is a beast. We're talking 685 billion

04:50

parameters. Whoa. Okay, for listeners, parameters are kind of like the learned data points, the connections inside the AI's brain, right? More means more capacity. Exactly. And $685 billion puts it right up there with the biggest models out there like GPT -5, CLAWD -4. It's a serious contender. And it's coming out of China. And it's gunning straight for the top dogs, you're saying. Open source, though. That's the kicker.

05:10

It's apparently faster than CLAWD, way cheaper to run than GPT, and yeah, totally open source. Okay, that could be huge. The most capable free model out there, potentially. It really could be. Early tests, and these are independent tests, show it's matching or sometimes even beating GPT -5 and CLAWD4 on real -world tasks. Got an example. Yeah. On the ATER coding benchmark, it scored 71 .6%. Okay. How does that stack up? That actually slightly beats CLAWD Opus 4. And

05:40

here's the crazy part. It's apparently 68 times cheaper to run. 68 times. Yeah. Wow. Okay, that is a big deal for developers, for anyone trying to build things like BI. Huge deal. And what's interesting is how it does it. You know how some open models feel kind of nerfed or just bloated and slow? Yeah, sometimes they're not quite ready for prime time. Right. But v3 .1 seems to be both really high performance and efficient. Fast enough for real -time stuff. Yeah, no twisters.

06:07

Any technical tricks? Well, it supports different precision formats like BF16 and FP8. Basically, Waze the AI handles numbers. Lower precision can mean faster speed and less memory, sometimes with a tiny hit to accuracy, but often worth it. Okay, so more flexibility for developers to run it on different kinds of hardware. Makes sense. Exactly. Plus, it has some cool new features built in, things called thinking tokens and search tokens. Thinking tokens. Search tokens. What

06:35

do those do? So thinking tokens, they kind of let the model do more internal work, like reasoning through a problem before giving the answer, sort of like showing its work internally. Oh, okay. Like chain of thought prompting, but maybe built in. Kind of like that, yeah. And search tokens let it pull in live information from the web as part of its process. So it can reason and access up -to -the -minute data? That seems to

06:57

be the idea. Pretty powerful combo. And the community noticed DeepSeek had quietly updated everything, removing their older R1 model. Ah, streamlining things. Yep. And this V3 .1 just shot to the top of the trending models on Hucking Face almost instantly after it was uploaded. Whoa. Just imagine the potential there. Models that can really integrate that deep internal reasoning with live external data, all at that kind of scale. Like stacking Lego blocks of data and thought together in a

07:28

whole new way. Yeah, the possibilities are kind of staggering. So here's the probing question. What does this quiet release, this powerful open model just appearing, what does that mean for AI accessibility? For everyone. It signals a powerful, cheaper and definitely more open future for advanced AI, I think. Democratizing it a bit more. Right. OK, let's shift one more time from these huge frontier models back to. More immediate stuff. Practical tools and developments

07:56

we're seeing right now. Yeah, the day -to -day impact. Exactly. Because beyond the absolute cutting edge, AI is weaving itself into the tools we use all the time. Like Grammarly just launched eight new AI agents. Indeed of them. What do they do? Think of them like specific writing helpers. They help with grading, checking for plagiarism, even spotting AI -generated text, helping you brainstorm. across the whole writing

08:18

process. I like specialized assistants. Yeah. And I got to admit, you know, I still wrestle with prompt drift myself sometimes getting the AI to stay on track. Oh yeah, me too. It can wander off. So having these kinds of smarter assistants built right into where you're already working, that's super helpful, actually becoming essential, I'd say. Totally agree. And speaking of integration, we're also seeing new ways to

08:38

like measure AI's impact. How so? Well, RFs, the SEO tool company, they just launched a dashboard that tracks web traffic coming from ChatGPT and Google's AI overviews. Oh, interesting. So you can see how much traffic AI search is sending you. Exactly. Across like 44 ,000 sites already. So you can finally track AI search as a real channel, measure its ROI. That's going to change SEO for sure. Yeah, definitely. You need to know where your visitors are coming from. And another

09:08

big one, Microsoft Copilot AI. It's now directly in Excel. Wait, in Excel itself? How does that work? There's literally a new function like sum or average, but it's Copilot. You can type plain English in it like Copilot, summarize sales data in column C, or Copilot, find the Q3 revenue for product X from our sales report. Wow. So you can pull data, analyze it just by typing a formula in natural language. Yeah. Pretty slick, right? Especially for people who live in spreadsheets

09:37

all day. That is slick. Okay, and any other quick hits from the AI world that caught your eye? Yeah, a few rapid -fire ones. Adobe's building out its AI features, like a new AI home with PDF tools and creation stuff. Integrating it deeper into creative workflows. Meta is apparently rebuilding its whole AI division midstream to try and catch up with OpenAI, a big internal push there. The race is definitely on. OpenAI says it's now very serious about adding proper

10:03

encryption to chat GPT. Big deal for privacy if that happens. Yeah, that would be significant. ByteDance they own. TikTok has a new model called M3 Agent that can process video and audio together in real time. Think smarter content moderation or analysis. Okay, multimedia AI. And NVIDIA released a new small open model, but it has a reasoning toggle. A reasoning toggle. Like you can turn its thinking process on or off. Kind

10:28

of seems like that, yeah. Maybe control how much effort it puts into reasoning versus just giving a quick answer. Cool concept. Lots happening. So zooming out slightly, how do all these smaller, more specific AI tools, these integrations, how do they change our actual daily workflows? They make complex tasks simpler, way more efficient, and honestly, often more creative, too. Yeah, taking the friction out of things, letting you focus on the bigger picture. Exactly. They handle

10:56

the drunk work. Okay, so let's try and wrap this all together. What's the big picture here for you, the listener? Yeah, what does it all mean? Well, we've seen AI models jumping into real -world prediction markets, making these surprising, sometimes really profitable bets. It suggests they might be seeing things we don't. Then we saw that monster open -source model, DeepSeek v3 .1. Just appear quietly, challenging the big guys, offering huge power much more affordably,

11:22

democratizing things. Right. And we've seen AI weaving itself into everyday tools. Ramily, Excel, Adobe. Making us more efficient, maybe more creative. So it's clear AI isn't some far -off thing anymore. It's here. It's here now, actively changing how we learn, how we work, and maybe even how we see the future itself. And that leads to kind of a provocative thought, maybe. The line between our own intuition, human gut feelings, and what AI comes up with, it's getting really blurry.

11:53

Yeah. Where does one end and the other begin? So consider this. If AI models can consistently know something we don't about what's coming next, if they consistently beat human predictions in some areas, what does that really imply about the limits of our own knowledge? And maybe even more deeply, what does it mean for what we choose to believe? Why we believe it? That is a lot to chew on. Definitely something to think about. For sure. Well, thank you for diving deep with

12:17

us today. We really hope you found some truly important nuggets of knowledge and all that. Out to your own music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript