🎙️ EP 118: GPT-5 Beats Humans in Space and AI Gets Political

00:00

DPT -5 just managed to score a gold medal in a Ph .D. level astronomy exam. It actually outperformed some of the best human participants in the world. Beat. It feels like we're approaching the limit, not just of what machines can compute, but maybe, you know, the limits of what human knowledge has already gathered. That's a really interesting way to put it. Yeah, it shows us exactly where that frontier, the edge of capability is right now. Totally. Welcome, everyone, to the Deep

00:27

Dive. So we've synthesized the latest stack of your shared sources, you know, these critical reports, academic studies, the industry newsletters. And our mission today, well, it's pretty simple. We need to figure out what these stunning AI achievements really mean for academia, for the big shifts happening in jobs and for ethics globally.

00:46

Yeah, we've got quite a dive planned. So first, we'll definitely start with that astonishing AI academic mastery in cosmology, you know, where the models just completely raise the bar for human testing. Then we're going to shift gears pretty quickly. We'll cover key trends in creative AI adoption and also what's happening with global

01:03

regulation. And finally, and this is crucial, we really have to take a critical look at the new internal data that details the political bias deep inside these large language models. Right. Think of these different sources like stacking Lego blocks of data. Quickly build them up, put them together into a clear view so you can walk away really understanding the critical nuances. Let's unpack this pile. Let's get into

01:25

it. Okay, let's start where the sources were, frankly, most surprising, the academic world. A new paper just dropped showing GPT -5 and Gemini 2 .5 Pro achieving gold medal levels on the International Olympiad on Astronomy and Astrophysics, the IOAA. Yeah. Now, that sounds impressive as a headline, but we need to grasp the level of difficulty we're talking about here. Well, what's fascinating, right, is that this isn't just recalling facts

01:50

or definitions, not simple trivia. The researchers, they tested the models using actual IOAA exams from 2022 right through to 2025 projections. These are serious PhD -level, multi -step theoretical problems. They need deep physical understanding, complex math. I mean, they're designed for the absolute most dedicated space nerds on the planet. when you look at the scores, the performance is. Well, it's almost unbelievable. Looking at that breakdown, GPT -5 was just dominant in the

02:20

theoretical exams. Yeah. Scored 93 .0 % in 2022, nearly 90 % the next year, and still strong at 86 .8 % for the hypothetical 2025 questions. Right. And while Gemini 2 .5 Pro actually managed to take a narrow lead in the 2024 exam with 83 .0%. The standout figure for GPT -5, for me anyway, was its exceptional performance in the data analysis section. It scored 88 .5 % there. was actually higher than its general theoretical score. Okay,

02:48

so what does that tell us? Well, it suggests the model isn't just good at, you know, recalling principles. It seems to excel at handling complex, kind of messy, real -world data, maybe even better than generalized theory. So if we step back for a second, what does this mean for us, for you listening? The model didn't just meet the gold medal thresholds. In multiple years, GPC -5 actually outperformed the best human participants competing. Yeah. That just changes the whole conversation.

03:16

It really does. Now, I should add the caveat. You know, not all models are quite there yet. Claude Sonnet 4, for example, fell noticeably short. OK. But crucially, all the top models still made what the researchers called human -like mistakes. They didn't get perfect scores. They weren't flawless. Right. So they aren't achieving some perfect theoretical truth then. They're mimicking human error patterns. What does that tell us about their current learning

03:41

methods, maybe their future trajectory? Well,

03:44

it suggests they aren't finding some. you know perfect objective truth out there they seem to be replicating the inherent gaps maybe the biases the blind spots that are present in the human scientific literature they trained on which means right now the models are acting as incredibly powerful mirrors of our knowledge not necessarily perfect originators of new knowledge that's a key distinction so here's the implication then if ai can crush these complex multi -step tests,

04:11

tests needing reasoning, sophisticated data handling, then maybe using these really rigorous science exams needs to become the new global gold standard for benchmarking AI capabilities. Exactly. We have to move past like simple reading comprehension tests. The bar just went way, way up. Whoa. Just imagine scaling that kind of analytical power across every single scientific discipline, you know, from biochemistry to advanced particle physics. The pace of discovery is going to accelerate

04:39

wildly. It forces us to redefine what expert human thinking even means. OK, so let's circle back to that point about mistakes. If models are still making those human -like mistakes, what does that tell us about their current learning methods? They replicate human error patterns, showing their current training boundaries are still imperfect copies. Imperfect copies. That's

04:58

a powerful thought to start with. So if AI has mastered the abstract rules of the universe, you know, in astrophysics, the next question has to be, how quickly is it rewriting the rules here on Earth, the social rules, the legal rules? Let's move from academic genius to how AI is shaping culture, industry, law right now. Yeah, we're seeing a major cultural shift happening. Let's look at the creative market first. generative video. It's truly hitting the mainstream consciousness

05:25

now. We've all seen those viral sore generated Olympics memes, you know, the ones Jesus swimming, the smoking Olympics. Oh, yeah. I saw the smoking Olympics one. It felt like like the speed of creative absurdity is just astonishing now, almost overwhelming. It really is. And meanwhile, the heavy hitters, they're preparing for the next big thing. We saw a huge signal from XAI. They're actively recruiting NVIDIA specialists specifically to create world models aimed at video game creation.

05:51

World models for games. Yeah. This is about building persistent, believable digital worlds. That's a massive market signal for where future investment is likely headed. That's a huge focus. Okay. And on the more accessible side. There was that subtle hint about chat GPT possibly becoming more social based on a hidden messaging tab OpenAI's COO showed. Right. Maybe they want to shift from just being a utility to more of a communication platform. We'll see. Plus, I always love finding

06:20

these practical like daily use things. A researcher just dropped a free AI tool that converts any PDF into a fillable form. Almost instantly. Oh, nice. Yeah, you just upload it. It auto detects fields. You export. Super useful for, you know. The rest of us not building world models. Definitely handy. OK, now let's talk governance, because the battles here are really heating up. The use of AI is just out casing regulation. Hollywood's

06:45

above the line union. So writers, directors, actors, they are gearing up for serious AI negotiations. Right. And experts are already pointing to things like the virtual actress Tilly Norwood as a key negotiating issue. It really defines the future of digital labor, IP ownership. All that stuff. It's not just the unions trying to catch up, though. The official regulatory landscape is scrambling, too. Yeah. Oh, absolutely. Globally, the EU just launched its massive one billion

07:10

apply AI plan. They're offering industries free supercomputer access, new AI hubs, really trying to push industrial adoption across Europe. And in the U .S. Well, California. often setting the precedent, became the first state to actually regulate AI -companion chatbots. That touches directly on sensitive areas like mental health and data privacy. Wait, let's go back to Tilly

07:32

Norwood for a second. If unions and regulators are pushing back, are the unions fighting for her to be considered like an asset or a contracted employee? What's the core legal challenge there? Well, the core conflict really boils down to defining ownership of the performance. If Tilly Norwood's digital likeness is trained on a real actor's movements, their voice, are we paying royalties to that original actor? Or is the synthetic

07:55

character a completely separate entity? It's fundamentally a battle over who owns the creative labor, whether it's physical or virtual. Gotcha. Okay. And we also saw some significant tension bubbling up in the foundations of this tech, the data infrastructure itself. China issued a threat to, quote, Pop the entire AI data center bubble. That sounds serious. A major economic and geopolitical risk. It creates huge instability.

08:19

Yeah. training these massive world models, it requires just colossal clusters of GPUs, enormous amounts of energy. So when the infrastructure, the access to that hardware becomes dependent on political stability or, you know, single nation states, the entire global progress in AI is potentially at risk. OK, so with unions and regulation ramping up everywhere, what do you say is the single biggest emerging area of legal conflict? Data ownership and labor definitions, especially for

08:47

virtual assets like Tilly Norwood. mid -roll sponsor replace holder. Welcome back. The sheer power of these tools, you know, shown by crushing those PhD level tests, it just demands rigorous ethical testing. Even as the models get better at science, we have to look really closely at bias. Let's shift to that new internal study on political bias that OpenAI released. Yeah, this is critically important. It really gets

09:08

to the heart of alignment. So OpenAI claims, based on their own internal tests, that GPT -5 is about 30 % less politically biased than both GPT -4 and the newer GPT -4 .0. Okay, 30 % less biased. How did they test that? They were pretty rigorous. They used 500 different prompts across 100 sensitive topics. Then they graded the responses on five specific... Bias metrics, things like neutrality, emotional mirroring, that sort of thing. That 30 % reduction is progress. Yeah,

09:38

definitely. Yeah. But according to their own research, bias still manages to creep in. It shows up in three core ways, even in this improved GPT -5 model. Exactly. So first. Is by stating opinions as facts, you know, where the model acts like it holds a political view instead of just synthesizing diverse facts neutrally. Right. And the second way. Offering only a single perspective, not really presenting both sides of a complex issue fairly. Okay. And the third mechanism,

10:03

this echoing emotional framing. That sounds subtle. Can you unpack that for the listener? Sure. Emotional mirroring or echoing emotional framing means if a user starts the conversation with, say, an angry or really polarized statement, the model

10:16

tends to subtly match. that negative tone and polarity in its response it makes the exchange feel more biased more reinforcing than a strictly neutral response would be it's a very very human social mimicry pattern actually and this context is absolutely key because we are entering this massive global election wave right between 2024 and 2026 billions of people might be turning to these models for political information the

10:42

stakes just couldn't be higher Precisely. If someone asks a highly political or sensitive question about, say, a candidate or a policy platform, that response absolutely needs to be strictly neutral. Otherwise, it could definitely affect public sentiment, potentially even election integrity. You know, I still wrestle with prompt prift myself sometimes when I'm trying to keep models balanced in my own work. If I get slightly

11:06

more opinionated in a follow up question. it can be really hard to pull the model back to the center. Achieving genuine neutrality is, well, it's incredibly difficult. That's a great observation, and it shows exactly why we need these kinds of internal studies and transparency. But we should put this into perspective, too. OpenAI's own logs show that fewer than 0 .01 % of real chat GPT conversations actually show measurable political bias across the board. Oh,

11:32

that low. Okay. Yeah, the vast majority of interactions are, you know, neutral or technical inquiries. asking for code, summarizing text, things like that. Okay, so given those remarkably low real -world bias logs, that tiny percentage, why is this internal study still considered so critically important by researchers and regulators? Why focus so much effort there? It's about scale and the potential for targeted influence. Small biases affect sensitive electoral results during

11:58

major global voting periods. Got it. Scale and sensitivity. Well, this has been a truly comprehensive deep dive. We've seen AI move from absolutely crushing Ph .D. level tests, showing this stunning, almost superhuman capacity for objective knowledge to becoming a central negotiation point for Hollywood unions and a flashpoint for global infrastructure

12:19

tension. Yeah, the underlying challenge, it really remains ethical alignment and sort of societal definition, whether it's that subtle bias in political answers that could potentially swing an election or the regulatory status of virtual actors like Tilly Norwood. The human AI interaction is forcing society to define its legal and ethical rules basically in real time as the tech just keeps scaling up. Thank you for sharing your sources and diving deep with us today. We really

12:42

appreciate your curiosity and engagement. We'll leave you with this final thought to chew on. If models like GPT -5 can outperform the very best humans in objective science, like astrophysics, should we maybe be focusing less on human -level performance as a goal, and perhaps focus almost entirely on preventing those subtle, potentially targeted biases in the really subjective areas, like politics and culture? Something to mull over until next time. Out to your own music.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript