Bloomberg Audio Studios, Podcasts, Radio News.
Hello and welcome to another episode of The Odd Lads podcast.
I'm jille Wisenthal and I'm Tracy Alloway.
So, Tracy, you know, you ever come across some writing you can't articulate exactly why, but you're like, I'm pretty sure AI wrote this?
Does this happen too much?
So, full disclosure, I haven't really thought about it that much. Yeah, because the thing is I probably should think about it more, but there's a lot of bad writing out there, and I've become sort of a nerd to it. And I also think that I don't know trying to figure out whether or not something was generated by AI nowadays, if you actually dedicate a lot of your own time to doing that, that is a huge mental burden to be attempting.
Especially you and I are in the journalism industry. How many of the pitches do you think that we get from prs right now are being generated by A I imagine if you're reading each one of those and trying to figure it out on a daily basis.
You know what I suppose I think about it the most is someone will respond to a tweet yeah, and I'll be like, well, if this is a real person, then maybe this person deserves some engagement and ask a question or I want to respond. But if there's a person in the bot, then obviously I don't. And that's where I look, you know what, I want to figure it out. I would like to know the answer.
You know.
I have a controversial view about AI writing, by the way, which is that it's pretty good. I mean, like, by and large, and I said this, I think maybe in a recent episode. When you consider the fact that I don't know the majority of the population, like doesn't know where to put a comma within the sentence, Well, this is my point.
It's pretty good.
I mean, yeah.
One thing I'll say about AI is it never gets the placement of a comma wrong.
On some level, it's perfect.
Did you do that? I think it was in the New York Times the test.
I kind of hated that.
Okay, why well, because I'll tell you, first of all, it's a five examples.
There's not very many. Two It asked the reader, which do you prefer?
But I think they were different subjects as well.
Yeah.
Also, I think most people probably treated that as can you guess which one is a human? Because everyone wants to say they prefer the human I didn't think it was like a great test. Nonetheless, Look, not only is it often indistinguishable, not often is it often fine writing. Sometimes AI could come up with a really remarkable turn of phrase. Yeah, but I still buy and large don't like it. You read like a thing, especially a long text a's AI, and it's like, even if you can't articulate.
It, it's like this feels AI.
It has a certain sickliness sweetness to it that is often annoying.
It's annoying.
What I notice about it is it doesn't do style very well, right, So if you ask it to write something in the style of a writer, if you choose anything other than something really obvious like Shakespeare, it really it suffers. But the text that it actually outputs is pretty clear. Yeah, right, like for basic understanding. Total it's probably better than a lotful what's on the internet.
The real people who are going to have to worry about this are like teachers obviously, universities and lawyers, student lawyers and maybe at it's fun, but there are sometimes it's like, Okay, did someone write this or not?
And there has to be it'd be nice if we could know the answer.
Well, the other thing that's starting to happen is have you seen any books out there that actually come with a disclosure or disclaimer that say this book has been written only by humans?
No?
AI used at all.
I saw that for the first time on a book that we actually read for an All Blots episode. I don't think it's come out yet, but that kind of threw me.
Yeah.
No, it's more and more anyway, as we enter a world at which the vast majority, if not already of words written are written by AI, is going to be interested in this question of whether we know Anyway, there's this company called Pangram Labs, and they have a little thing and you can pay for it, but also a free service where you can drop like a text in and it'll say the odds that is written by human
or AI. And I'm pretty impressed by it. I like did some samples of my own writing and then AI outputs it got them all right, But then I did some like further, like I tried to stump it to see if like. So, what I did was I took a piece of AI writing and then I had it translated into Chinese, okay, and then I had it translate that into High Chinese, so it's like, okay, imagine this is being written by a more formal register. And then I had that translated into Hebrew, and then I had
that translated into English. So the original thing through this series of Ai telephone, through various translations, and then I put that output back into Pangram.
I got that right. It said it was Ai.
So even after a series of sort of transformations designed to obfuscate the original style of the piece to see if you know, eventually it would emerge in something else. So I was pretty impressed. It seems to work. And you know, I think that's interesting for a couple of reasons, which is maybe there is something that you can just tell.
But two, it sort of worries me because you know, there have been articles and they'll say like, this is written by Ai, And I think one of my big fears would be that I write something.
I like to use an mdash.
I've always been in them, dash fan, I love m dashes. That's how people talk.
I'm sorry.
And then what if it says you wrote this by Ai, and I'm like, I didn't, And then here's this black box that is suddenly like Judge Jurgen, executioner for my career potentially who wrote this. AI the Lab says, so you are now done? Like that worries me. So I think this raises a lot of very interesting questions about these molde little detection things, and I want to learn more about how well.
There's also a lot of philosophical questions about just what we value in writing true as well, because no one's going to yell at you for using spell check or something like that, right, Like, it's kind of crazy to think that reputational risk is going to hinge on whether or not you might have used a platform, a chat platform to like do some basic copy editing.
Totally well, very happy to say, we do, in fact have the perfect guest.
We're going to be speaking with Max Spiro.
He is the founder and CEO of Pangram Labs, and he can answer all of our questions. So Max, thank you so much for coming on.
Outlaws, Thanks for having me.
How do you know it's right?
So someone puts in a piece of tech and we'll get into the method in the second. But someone puts in a piece of text and it says human AI, what makes you believe that you have a very good.
Track record all this question.
So when we started Pangram, we started by doing this thing we call a human baseline, which is how well can we as a human predict whether something's AI or not? That's the first step out like learning, is this problem tractable?
How hard or easy is it? And I found, like.
Me personally, I was able to get about ninety percent accuracy, and so we figured an AI model should be able to do much.
Better than that.
So I have a bunch of methodology questions which we can get into. But just before we get into any of that, why is AI slot bad in your opinion? Why does it need to be tracked and identified?
I think the problem is is just so easy to generate and so like it's very difficult to know, like what is the like intent behind it? Basically, Like right now, I think we're actually pretty lucky living. We live in a world where the signs noise ratio on the Internet and in our information.
Channels is pretty high.
We have pretty high signal to noise, But any bad actor can come in and just flood our information channels with aislot that looks legitimate. It looks like somebody put actual effort and thought into it, but really it was just like a single prompt which could have also been automated.
This is something that I think about a lot, which is that there was a point in time and maybe still is the point in time where if you read something that was grammatically correct, where the punctuation was strong, where the spelling was strong, there was reason to think that the person who wrote it was a person of like certain seriousness and a certain intelligence behind it.
And I think that the issue that you're.
Identifying is that that link is now being severed so that we can't use these heuristics anymore, such as the strict quality of the pros to know in fact whether this was published by someone who was like a serious actor, intelligent or or not.
And now you have people inserting typos into their card that's true that they are Yeah boyd.
Sorry just to go back to my original question. So you mentioned, okay, you're able to get it ninety percent right, but now we've been used a lot more and you have people paying for your software, presumably teachers and journalists, etc. Given all of that, getting from ninety percent to one hundred, I mean, if you could make one out of ten it's clearly an unacceptable error raid for a piece of
commercial software that could call someone an AI creator. So you have to do a lot better than ninety percent. Talk to us about like what you've seen so far in your data since releasing it as commercial software that makes you believe the software is doing a correct job of allocating between the two categories.
So we've built out really comprehensive emails, okay, and so our evaluations. There's two kinds of errors. There's a false positive, which is when something is written by a human and then we say that it's written by an AI, okay. And there's a false negative, which is if it was AI written and we don't catch it. And so we track our numbers for both of these, and for human.
Writing, we're actually pretty fortunate.
We have like millions and millions of samples, so we can get like a false positive number that we have a very high degree of confidence in. And our number right now is about one in ten thousand. Ok So, if we scan ten thousand documents on average, one will come back as.
AI when it was actually human.
And what about in the other direction false negative?
I would say around ninety nine percent accuracy, So like around one percent false negative rate. I think this depends a little bit more on like how adversarial the prompting is, how much they're trying to ev.
What I did exact send it through multiple filtrations to obfuscate the original output. That would be an example of adversarial prompting exactly.
But in like the general case where we're just looking at straight outputs from AI, it's above ninety nine percent.
Okay, okay, So what is your model looking for exactly when it's evaluated a text? Because, as we mentioned in the intro, you know, syntax and grammar tends to be pretty good on AI generated copy. The style is sometimes more of an identifier, I would argue to your point, Joe, like, sometimes it reads very saccharine and kind of overly earnest in some ways. So what exactly are you focusing on here? What are the tells?
Yeah, so the style and the word choices are definitely part of it. But I think what a lot of people don't realize is they're actually making a lot of decisions when they write a piece of text. So there's you know, dozens or hundreds of ways to phrase every single phrase, and over the course of fifty or one hundred or two hundred words, you're making thousands of decisions actually, And so what we're doing is we're learning the patterns
and how like these frontier models make these decisions. And if the vast majority of these decisions line up with how the frontier models are doing it, then it's vanishingly unlikely that this was written by a human. You would have to just happen to make the same exact decisions that the LM does hundreds of times.
Interesting, Okay, this.
Is a really important point.
So everyone at this point has some feel for let go the M dash tell right, But my understanding is it's not like you don't go in in like hard code if you see a bunch of M dashes. This is the thing these decisions. In many cases, I imagine, neither you nor the model itself can articulate in English what the decisions are. All you know is that the decision pattern exists.
Is this correct?
This is correct?
Okay? Can you explain?
So therefore, what does it mean that your model has learned these decision?
So what we're doing on the very broad scale is we're training a deep learning model. So it's a pretty big black box, but it has the base model of a language model, and then instead of predicting the next token, it's predicting whether it the text is AI or not. Okay, And how we train it is we train on tens of millions of examples, so it sees millions and milli of human examples, and for each human example, we also
show it an AI example. So, for example, let's say one of these is a five star review for Denny's that's seventy eight words long. Then we'll ask in AI to write a five star review about Denny's that's seventy eight words long in the style of the first one. And obviously these two will be different, and so our model is able to learn through contrast, what is the difference between.
Me and The Important thing, sorry, just to be clear here, is that you and I might not be able to articulate the difference. There will be some difference in maybe the sentenced length, there will be some difference in word choice, there'll be some difference in punctuation, syntax, whatever, but you and I wouldn't obviously spot it. However, after millions of examples of these side by sides, the model learns what the difference is exactly.
I think the best that a human can do is look for some of these like really obvious tells like chat. GIPT loves that, like it's not just X, it's y framing. Earlier models really liked some specific words like tapestry and intercate and delve.
Yeah, delve tapestry. Yeah.
But yeah.
I think by training Pangram, we're able to go much deeper than this and look deeper than the high level science at the like document level science.
So one thing this kind of reminds me of and I'm thinking how to phrase this, but it reminds me of you know those exercises people used to do where you would take a bunch of different faces and meld them all together and come up with like one face that was supposedly attractive. So, like, to what extent is this basically a distributional detector in the sense that you're looking for like certain paths that you think AI would choose.
And I guess, like, could you get a false positive just from someone who's choosing like the average of the average of the average in a way to state a particular sentence.
Maybe there's a reason we have our false posit rate is one in ten thousand and not zero. It's because you know, sometimes we look at the false positive and it's like, oh, it reads exactly like an AI generated review or essay, except that it was written in twenty nineteen. So it was probably a human who just happened to find the exact like mode collapsed.
Type of way that like, yeah, thats right, Yeah, I would say, yeah.
I think it's a good way to think about the distribution of writing or writing as a distribution where like, you know, there's the space of all human writing, and then AI writing is really just.
Like a small point within this space.
It's very no matter how much you prompt it, it doesn't go that far from where it was trained to be.
Yeah, okay, WA's the black book.
So I built a little model myself. I built this thing that detext. You can upload text and says whether it's more resemblant of the written word or the spoken word. Oh I saw that, yeah, yeah, And I used bert, which is like one of these things open source one from Google.
What is the core model that.
You trained on or is it something or did you build it yourself?
Like, talk to us about that.
Our very first model was actually built on Burt, but future models we needed to up our capacity. So basically we were running into capacity limits with our model. It was capping out at a certain false positive false negative rate. It wasn't learning the deeper signals, so we had to ten x and then one hundred x the parameter account so that can learn like really deeply, like how these frontier models.
Right, Have you noticed any interesting differences between how the models right? Can you and actually is your model trained to identify different models as well as whether or not This is just broadly AI generated.
So we don't specifically train it on different models. We don't say like hey, this one is CLAT three and this one is Chat or GPD five. What we've done we've done some interpretability work to look at basically the output embeddings of the model and where we find that
it actually learns which model the text came from. So you could see like little clusters like this is the Clod cluster and like all the clods, yeah, cluster around here, and then these are like the deep Seek and Quinn and then this is like Chat schipt and they all kind of like cluster into different spaces and embedding space. So clearly the model is able to learn what the difference is between these frontier models.
We actually since you mentioned Quin, I'm very interested is there anything like distinct in terms of how Quen generates text versus platforms that have been developed in the US.
I think Quen is unique because it's trained on a lot more Chinese and multi lingual tokens than other models. So you know, I've heard from Chinese friends that it's it's much better at like being conversationally fluent in Chinese.
Beyond that, I don't know that I can tell.
It would be hard for me to look at a text and say, like, I know that's Quen, But I think somebody who's more familiar with it might be able to.
Let's talk about sort of some of the philosophical or societal implications of this work.
Have you had.
Anyone whose text has been judged to be ai written by Pangram and they're like, I swear to God, this isn't you're in? They like, really insist, and what do you think about this situation? What do you do or talk choice about that.
I've had a couple of times this happened. There have been times where I genuinely believe that you know this is just a false positive. We scan hundreds of millions of documents, so like, at a certain scale like this will happen. But I also get people who all the time they're just like AI detectors don't work.
It's like a total fraud.
And then whatever they're putting out on LinkedIn is just one hundred percent AI generated.
And they're just like mad that they're getting called out.
And then you look back farther into their past and their history, like everything they're putting out is AI generated until about like twenty twenty three, Like for everyone, if you look historically, there's a lot of like slop accounts that are putting out total slop, and you can tell either they like weren't posting as much before, and if you scan back in time, then you see that they were writing human text at some point.
So there's a number of accounts out there that basically right around the beginning of twenty twenty three, where if you scan the entire corpus of their work, it very clearly shows a switch.
Right around early twenty twenty three.
Yeah, it really like depends on the account. I think one thing we saw that was interesting was there is a writer for The Guardian that was covering the Winter Olympics, and somebody was like, hey, this article is like total AI slop. Ran it through pangram it was AI. The Guardian was like, no, of course, our writers don't use AI.
And then we so we scanned this single writer's history and we found that they really did start picking up AI like mid to late twenty twenty four, and we're using it more and more in their articles.
I mean, just play Devil's Advocate for a second. Does intent matter when it comes to identifying AI slop in the sense that, Okay, I get you can have a bad actor who's maybe trying to influence how people feel about a particular topic, and maybe they've created a bunch of bots on Twitter slash x and they're using AI to just flood the zone with a bunch of AI
slop supporting their particular viewpoints. On the other hand, if you're a journalist and your business is to write, you know, like basic understandable copy about a news topic.
Just to be clear, I'm.
Not advocating this at all, but that intent is very different to I'm going to try to influence something by just you know, sheer volume.
Yeah, I mean, definitely these are like one is a lot more severe than the other. But I think at the same time, if you're a journalist and you're using AI to basically shirk your work and like not do your work, I think that's also a problem. And I think it's a reputational risk to the outlet because people can tell and people are going to call you out. There's a lot of people who don't want to read AI slop kind of regardless of where it's from.
Yeah, this is a definitely true. Are you ever going to run out of human material to change on?
Right?
Like you could be pretty confident that if you find some piece of text that was published on the internet prior to twenty twenty three, but certainly prior to like twenty nineteen or something like that, you can be extremely sure that this is human generated. Do you worry that in the future, like it's going to be harder to even establish the provenance of your training data.
Uh, Yeah, it's definitely a concern for us.
Talk to us about how to think about this.
So we have a near infinite data reservoir of pre twenty twenty three data, there's just like more than enough for us to train on for a long long time. But part of the problem is we also want to train on modern text. We want to there's all this talk about like if somebody's writing about LMS or about AI, we don't want to incorrectly flag that as AI because
our training data has no sense of this topic. So I think we're looking at different ways to do this, but most of them are just like figuring out like who is a trusted actor?
Who do we know is.
Putting out humor written content and we could use our model for that, like to some degree. And then so we have known actors, we know they're putting out human written content, and then we could use their as well.
Slightly random question, but using your model, are you able to quantify like what percentage of the Internet at the moment is aislot?
It's about forty percent based on why you're just how'd you get that number?
So a lot of the Internet is just like SEO written articles and like, yeah, it's articles written for search basically so that your website comes up more often in search because it's targeting certain keywords. And a lot of that industry has switched over to using AI because then instead of having to pay writers you could turn out articles for pennies on the dollar, but I think that kind of results in a lot of the Internet being AI written. It's a little bit is also kind of
platform dependent. It's about forty percent from like a Internet page perspective. About a year and a half ago, we looked at Medium and found that over fifty percent of newly written Medium articles were generated, which was a crazy high number.
What about Reddit?
Reddit, it was seven percent a year ago, I believe a little over ten percent.
Well, actually this reminds me. So I'm on Reddit a lot and I really enjoy it nowadays as a platform, but I do worry about how much of it is being generated by AI. And the thing I don't necessarily understand is what are the economic incentives to actually write a bunch of AI generated posts on Reddit and get up voted, Like why does that system or motivation even exist.
So there are startups I'm not going to name names because I don't want to promote them, but they will sell a promise to companies that we're going to get you organic mentions on Reddit. We're going to run our AI bots that seem organic, and they're just going to, you know, naturally recommend your product or you know, just mention your product in the comments or in a post.
And so I've seen evidence of this. We can find these like they're basically like botforms that are mostly engaging, seemingly organically, just like doing a short reply, and then sometimes they're doing this brand mention. And so that's why these posts are very valuable.
That's really interesting.
I have to you also imagine it's valuable because all of the models train on Reddit, right, and if you want your product's name to appear in model outputs, it's like, what is the best you know, nose hair trimmer or whatever, And there's a bunch of bots that on Reddit talked about this nose hair trimmer, and then that's probably more.
Likely to show up in a chatchypt request, right.
Yeah, yeah, it's been weirdly gamed. You know, you used to just google best nose hair trimmer, and now there's like a thousand.
The Reddit search results like show up first nowadays.
Yeah, that's where people are looking.
Yeah, and then people start searching best nose trimmer Reddit to get their Reddit comments on it. And now it's people have realized that that's what people are searching for. So you need to populate Reddit with your advertisements.
I'm on the Men's health Are you looking for nose hair trimmers?
The Panasonic ear and nose hair trimmer is the number one choice on men's health pros. Easy to hold anyway, it's not.
Yeah, it's all these affiliate links. Yeah, just destroyed the Internet.
I know it's it's too bad, but whatever, talk to us more about the whole pipeline. So, I'm very fascinated by this idea. It's like, Okay, you see this review for Denny's. You have the AI model.
Try to replicate it as best as it could. Movie these subtle differences. Talk to us as though about, like the whole pipeline.
What are the other tests that you're using to get the true you know, because what I imagine you're trying to do is get the most similar data sets with an almost imperceptible difference to really stress tests. Yeah, talk to us really about the whole pipeline.
Yeah.
So what we're really trying to do here is we're as.
A model maker myself, no, no, sorry, keep going.
Yeah, as an AI expert, Yeah, yeah.
As an AI expert. I need to hear some tips of the field.
Uh yeah, So what we're really looking for is examples that are as close to the boundary between human and AI as possible that our model learns better. Something that's very obviously AI is, you know, our models not learning as much same thing for something that's obviously human. And so step one is creating this data set with synthetic mirrors of human examples, and then we train a model,
and then step two is something called active learning. So we then take this model and use it to scan a much larger corpus of data and look for errors, false positives, false negatives, and then we pull those back into our training set and are able to train a much better model because it's seen these errors, which and these errors we believe are just much closer to the boundary between human and AI.
So sorry, just to be clear, the first pass is like, okay, you have known human writing and known AI writing. You train a model, and then the next pass is once again unknown human and known AI writing. So you already know the answer of each of these and therefore you could come up with a list of which it got wrong, and then that gets fed back into the first.
Verse exactly, and so that makes once we retrain, then the model gets much much better, and then we could do this as many times as we want to, kind of just have a self improving model that gets better with every training run. I can also tell you go a little bit more into how we deal with AI edits, because I think that's increasingly important. Problem is, like I think most writing will be AI assisted in the future. I think it's already in Google Docs and it's in Google Keyboard.
Grammarly arguably has been doing this for a while.
Exactly.
Yeah, Grammarly uses LMS on the back end, and we don't want to just say, like, all writing is AI now. We want to be able to differentiate between AI assisted and AI generated. So what we do is we also have different prompts. So rather than saying so for our human review of Denny's, rather than saying, generate a review like this, we could say, help improve this, make it more formal, make it more like, clean up the grammar.
And so we have like a long list of AI editing prompts, and then we're able to look at basically the cosine difference the distance between the original human text and.
The in that hyper multidimensional space.
Exactly, So how much did AI change this text? And then we're able to train our model to say, like we're just going to like put a point on this distance and say like this is moderate aissistance, this is light AI assistance, and this is heavy aissistance.
Interesting. I'm going to do something I don't think I've ever done before, which is ask a founder about their corporate mission. But you know, you've set up this company, and when you think about what you're trying to do here, is it just basic AI detection in the sense that there might be you know, a few groups of people like teachers that find this very valuable, or is the mission something broader where you're actually trying to improve the Internet and what people see on it.
I believe the technology of being able to detect AI generated content is immensely valuable, and it's valuable not just for teachers, but for basically everybody in every profession. Lawyer's publisher is just an individual who consumes content on the Internet. I think it's valuable for all these people. But ultimately, yeah, our high level goal is to help mitigate some of these negative effects of growing AI content.
But for instance, just using the product review example, is the vision that like a Yelp, for instance, would want to use this technology to make sure that its system isn't being gamed or is the vision Like if I am a particularly diligent consumer who has a lot of time on my hands and I'm looking to go out to a restaurant, I can run all these individual restaurant reviews through Pangram and then like actually figure out if it's real hype or not.
So I think right now it's a lot of the former. We work with platforms. One of our biggest customers is Quorra, and they run a bunch of content through Pangram. But we have a lot of different platforms that use Pangram to help moderate and find AI bad actors and get them off their platform. But I also think, yeah, the individual consumer case has been growing a lot, and we're really interested in pushing.
Here the free version of pangram dot com. Like you get a handful of tests a day or something like that. If someone had an unlimited number of Pangram responses and maybe had an excess to the Pangram api at infinite scale, could they theoretically learn a prompt that they would then be able to put into an AI to generate human style.
Writer actually had a friend do that. He put his cloud code on a loop. I gave him some API credits, and then his cloud code just basically worked overnight writing a prompt trying to get it to put something that's human written or that which came back there from Pangram as human written. They got there, but the text was pretty like uh incoherent, so so like, yeah, it was producing more or less long gibberish. It was like grammatically incorrect. A lot of the words just didn't really make sense.
Because this was my first thought, like when I saw it, I was like, that would be like a fun experiment to see if you could take all the outputs, find the difference and just keep iterating on the prompt you would have to tell AI in order to eventually get an output that looked to Pangram like it was human generated.
Yeah, I think there's a way to do it if you also had like an LM judge on coherency and he's like Pangram and the coherency judge both to score your text. I think it's definitely possible, and I'm excited for someone to try to do it, because we could make our model a lot better and more robust if this existed.
So I want to know what your personal like token budget is nowadays that you're even like contemplating some of those stuff.
What I feel like I had the Cloude Max playing, you know, and I don't work like when I'm at work, I don't work on any of my Vibe coding projects.
And you know, like when we were kids.
I don't know if you remember, like if you didn't need all your food, like someone to say, oh, there's like starving kids in the world.
Yeah, I'm like, oh, it's starving Vibe coder.
It's like, oh, you didn't.
Like I have this four hour token window and I'm almost never maxing it out, and I'm just thinking, like, the are kids on the other side of the world that wish they had your tokens and you're you're not using all of your tokens for the window.
How dare you?
I feel a little guilty when I don't out max out by Claude max token program.
I also have Claude Max and yeah, most days I'm not doing much coding at all, I'm not maxing it out, and then some days I'm going you feel a lot.
Guilty about that though, it's like, yeah, yeah, so can I just feel like writing is kind of interesting, but like, what are the prospects of this being able to work on? Say, and you must get this lot image and video generation? Is it it all theoretically similar? Is there a reason to think that it will be replicable? Or is this just a different beast of a problem.
I think the approach is definitely doable. I think some of the economics change, especially if we look at video and the cost of generating video today. Okay, we can't generate video at the same scale that we can generate text, and so we might need a kind of different approach. But I also believe that if we're able to solve this for image plus maybe like audio, that could be enough to just solve it for video as well.
Huh, zero shot.
Could you ever envision, I don't know, launching some sort of like certification program for video because this seems to be my dad's a boomer spends a lot of time on Facebook, Like this seems to be what society needs, right, Like a video that comes with a little thing that says this is not AI generated and someone has actually like rubber stamped that, so.
There's an organization called c TWOPA, and I think they're doing pretty good work on content provenance. Basically, they are working with phone makers and hardware makers to basically embed like hardware signatures to prove that image and video we're truly taken from.
The hardware like watermarks basically.
Yeah, exactly so, So rather than marking the AI outputs, yeah, we're instead embedding like a proof of authenticity in the the like thing that's real and is captured.
In real life.
That's interesting, all right, So big picture, where's the Internet going?
You know, you mentioned forty percent of the Internet is already air generated, but maybe that's something end of the world, Like, you know, if it's just a bunch of SEO pages that I never read, I don't know whatever, But like give us some thoughts high level about like with the trajectory of the Internet. Regardless of the uptake of Pangram and other AD detection models.
I'm a little bit worried about the state of the Internet. I'm gonna be honest.
I think like right now, there's still like so much of it is built around trust and norms in a way that like we're we're not really well equipped to suddenly deal with an onslaught of bots at a completely different scale than we've dealt with before.
There's maybe like a good case and a bad case.
I would say, like the bad case is the Internet goes the way of debt internet theory, just like every space that's open and accessible is just flooded by bots, and then the only place people are able to communicate authentically is in like very walled garden like closed servers like like discord service for example, where you know everybody's identity is known and you know you don't.
Have bots in here. So that's maybe the like bad scenario.
Can I do an insane thought that I've had go on, We're gonna kick out of this? So when like I forget what they call like this idea of like for the bad actors, it's.
Called like heaven mode or heaven banning. Have you heard of this? So there's this thought that one way.
You could deal with bad actors on the Internet is suddenly they're on a version of say Twitter, in which they're only bots and everyone always agrees with them on everything and it drives them crazy and stuff like that, and they would never know it because they're like, oh, there's call, everyone's there, and then it's so like slowly like yeah, they just this is like a way you could punish people by putting them on an internet where they will never get any fight.
Band and put into basically jail. You're talking a bunch.
Of that's right, that's right, that would be jail. But you're heaven banned.
But I thought, and again, this is you know, like I built this little am model myself and I like showed it to my friends, like, oh, it's really cool, Joe. I'm really oppressed, Like I'm really impressed by like that you're able to do this. And I was like, are people being honest with me? Have I been heaven banned? Because I just like, like, you can be honest with me if it sucks.
And I sort of have the fear.
The biggest humble braggad this thing and everyone thought it was not great.
I'm just saying, like people are like I think people. I'm worried that like people bring nice to me because like, oh cool, Yeah that's repressed. You like did that.
And I have this like deep anxiety that like people aren't giving it to me straight about it. I know that sounds like a humble brag, but it's really not.
That's why you can never get like too successful, like Maya West surrounded by a bunch of you never get.
Like, oh, this is his first try doing something with vibe coding. I'm like deeply anxious, Like, no, you could just tell me if it sucks, that's fine, that's my worry.
I don't worry about this.
If I tweet that I'm eating a steak, I will get like a hundred people criticized and you didn't.
Put the meat.
Yeah.
Yeah.
So that's the other thing, which is that the two things you are never allowed to tweet about meat preparation and enjoying life, because if you ever enjoy life, then if you ever enjoy it, and if you ever prepare.
Meat, people will flip out at you on the internet. Those are the two things that you're not allowed to do online.
Very true, this sort of related question, But just going back to the methodology, if you're focused on this sort of like path dependent idea, I'm kind of envisioning it as like a giant decision tree, right, is there a possibility that as the models get better and better, and we know that they're already injecting like some degree of
randomness into their output. Although I know there's going to be a pedant out there who like messages me and says like, well, you know computers can't do like true randomness. But setting that aside, setting that aside, like, we know that they're adjusting, they're becoming more sophisticated at an incredible rate. We know that they're trying to adjust and inject some randomness in order to avoid exactly this kind of detection. Do you worry about their own adaptation at all?
I have noticed that the models as they get more capable, I believe that their output distribution gets more complex. It's harder to learn with a simple model, which is why we've been increasing our model size to capture a higher complexity function that can capture the LM outputs. So I think we may have to continue to make our models better. We're gonna have to work to keep up with it. We can't just rest on our laurels.
What our birstiness and perplexity.
Yeah, so this is a metric that's used by some AI detectors, but not Pangram okay, And so I can explain a bit about how it works. So perplexity is Basically a measure of this.
Is not perplexity dot AI the website. This is a technical term.
Okay, this is a metric. This is a measure of how confusing a piece of text is to a language model. So basically, if, for example, with every token, we can calculate some perplexity, which is basically like how expected is this is. So for example, like if it's I went home to my pet and then the next token is chinchilla, that'd be a much higher perplexity token.
Than my pet dog.
So low perplexity text or really like LM outputs tend to be low perplexity. They're not going to produce outputs that are surprising to themselves. So this is a decent way to get an AI detector that's around ninety to ninety five percent accurate. But it has some problems. The main one is that you can't improve upon it. Basically it has false positives. Text written by non native English speakers often is low perplexity just because when you're late.
Don't take as many risks. Exactly.
Yeah, interesting, Yeah, So that's why a lot of the early AI detectors had a bunch of false positives. With ESL speakers. It's because their text was low perplexity. So I think, like, this is a very cool metric, but it is not the path for pangram.
Instead, we went the deep approach, so we can do better than.
And what's in this is that just the opposite side of the coin.
Yeah, Burstinus is basically actually, yeah, I don't know if I can define it.
Okay, fine, Burstinus just sounds like one of those like sort of I guess manosphere terms, doesn't it like, oh, yeah he.
Has like he's been looksmaxing with high burst nets or something like that.
Yeah, that's great.
Yeah, I think it might just be like a measure of like sentence Lengthen, how the ups and downs of the text.
If we assume that the world is collectively concerned about AI slop and wants to do something about it, what would be like the single biggest change to the system, either in terms of like the economics of the internet or regulation or technology like what you're developing that would actually help reduce slop.
Yeah, I think the biggest one is norms. So there have been a couple of great blog posts written about how it is rude to send other people undisclosed AI outputs, and I think I like completely agree here. I think, you know, if somebody like asks the question on the Internet and then somebody else like goes and puts into chat CHEPT and then like pace the answer, it's kind of rude, Like like I was going here to ask the opinions of my friends or you know, my followers, not.
Just like not chat GPT. I could have done that myself.
And so I think, like building this norm is something that you know, it's very new technology, so we need to do it quickly.
But I think this would help a lot for society.
Well then actually just gets to a question that I have then, which is I feel as though the major Internet platforms are actually moving the exact opposite direction. I mean, I'm stunned. Maybe I accidentally clicked on something at some point, but the frequency with which I can email and then I open it up to respond in Gmail, and there's that ghost text there that do you just want GEM and I to respond to this?
I've never done.
That, I also consider, I think that would be extremely rude. I've never responded to any email with AI respond But they're basically telling you to do that. They're doing the exact opposite blowing up these norms, And so I'm curious from your perspective, you managed to work with Quorra, But from your impression, do the major internet platforms think this is a problem worth solving or from their consider and it is like you know what, Yeah, it feels content the better.
There's mixed incentives for the big company.
It's funny because like Google seems to be playing both sides. So like, on one hand, they had that advertisement which people kind of blew up about where it's like, oh, children can now send their heroes notes on like how much they respect them by using AI instead of like writing the note themselves, and like this is wrong, This
is like societally bad. But at the same time, they're working very hard to deal with the AI slop on the Internet in search results to make sure people get served real content and not.
AI slot content.
So I think, I mean, I think obviously there's a lot of incentives that play up around like product people who are incentivized to push AI because that is the corporate mandate. But yeah, I think overall, even like in my sphere, a bunch of people who are AI researchers, generally consensus is that like AI is a powerful tool, but like slop is bad.
This reminds me my parents used to make me do these like handmade greeting cards for every you know, for Christmas, for like all relatives and stuff. And it was supposed to be a demonstration of my commitment to communicating family. No, no, it traumatized me forever. And I hate greeting cards as a result of them of doing this, just spending hours
manufacturing these things. But then, secondly, the funniest thing was once we got E cards, my parents immediately switched to using e cards and just and now this is also the funniest thing.
My dad uses E card.
He figured out that the E card system can tell him whether or not you opened it, so he just uses it as like day to day communication.
Now that's so funny.
Just send an email to your daughter E card.
It's like, I noticed you haven't opened up my E card for International Hot Dog Day. Please let me know what's going on.
I'm terrible handwriting as a kid, and my mother made me write all of these handwritten notes to thank people for the gifts I got for.
My bar mitzvah.
Yeah, I hated it, but you know what, I have keep connections with all of.
Those people that have lasted over the years.
In that miserable one week where I just wrote and I got, you know, hand creamped, I think it.
Paid off, all right.
Well, imagine doing that for like sixteen years basically in a never ending stream.
Max Birou, thank you so much for coming on out Laws. That was a lot of fun. I'm fascinated by this conversation.
Thanks so much for having me. Yeah, really exciting to talk about this. And I think slaps is a growing problem, so hopefully awesome RAPK deal with it.
Of the internet, I.
Can't tell if I'm surprised by that oring on.
And what's it going to be next year at this time?
Oh man, I don't know.
It'll be like hard to stay over with Georgian that for sure.
Yeah, almost certainly crazy.
All right, thanks for coming on.
Oudlin, Thanks.
Tracy. I love that conversation.
I just think it's like a really fun puzzle, right, It's very like it seems like a fun question to solve, And I'm fascinated by this idea of how like with both humans and AI, there's gonna be this gap inevitable between what we know and what we can articulate because you and I both setting aside a a versus text,
there are things that we both know. For example, this is newsworthy, and this is this is a good episode of a podcast, This is a credible sounding guest, and this isn't the gap between that and then being able to explain why, it's like, well, you just sort of know it, right, You just sort of have this feeling there, and that intuition is built up from numerous examples, which is the same way in a sense that like the AI is trained.
It's like these.
Things that you only know from patterns and you can see them without fully being able to, like article exactly what's going on.
Well, the other.
Question I would have on that is is it even going to matter in the long run if you think about, like so much of the Internet is already built on bots and the sort of like false attention economy, Like if our entire like worldview becomes shaped by AI driven drivel, yeah, does it matter if like the economics of the Internet are still attached to individual bought accounts and things like that.
I don't know if I'm if I'm explaining this, but.
No, no, I think it makes a lot of sense, and I do think like it is important, like we're.
Going to have to change the entire way with them.
And Max said at the beginning, which is, and I've thought about this, which is that it used to be that if you came across a piece of writing and the punctuation was excellent and the spelling was excellent, and it was like cogent sounding, you're like, okay, this has been written by a smart person. I will read the seriously, right, And now there is this complete severance of sort of like craft and out put because you could and you did this, Like, ask Claude to write an argument in
favor of the most absurd proposition imaginable. Ask Claude to write an argument for me that the reason why Reagan wanted to do tax cuts in the early nineteen eighties related to these reports of UFO sightings in the nineteen seventies, and it will write something that not only is it grammatically correct, it'll actually like strain to come up with the best version of this argument before and again if prior to that, having read and like, maybe the person
like this person took this argument seriously, but now this argument is just created. Ax nail Oh We're going to have to really like change our heuristics about this stuff.
We've created an unlimited stream of basically cranks, which is really good grammar.
Yeah, that's right, that's right, because it used to be we knew the crank because they had bad grammar, or they would email us and like half the words would be in yellow and the other half would be underlined green.
Inlastic exams, the tools that we use to just like, oh, this person's a crank, they like, you know, half the words are at all caps and stuff like that.
Those don't work anymore.
All right, on that note, shall we leave it there?
Let's save it there.
This has been another episode of the Authlots podcast. I'm Tracy Alloway. You can follow me at Tracy Alloway.
And I'm joll Wisenthal. You can follow me at the Stalwart. Follow our guest Max Spiro. He's at Max Underscore Spiro Underscore. Follow our producers Carmen Rodriguez at Carmen Arman, dash Sho Bennett at Dashbot, and Cal Brooks at Kilbrooks. And for more oddloss content, go to Bloomberg dot com slash odd Lots. We're a daily newsletter and all of our episodes, and you can chat about all of these topics twenty four to seven in our discord discord dot gg slash od lots And.
If you enjoy odlots, if you like it when we talk about how the Internet is forty percent slop, then please leave us a positive review on your favorite podcast platform. And remember, if you are a Bloomberg subscriber, you can listen to all of our episodes absolutely ad free. All you need to do is find the Bloomberg channel on Apple Podcasts and follow the instructions there.
Thanks listening,
