TechSupport: Nothing’s Private – Even Conversations with ChatGPT - podcast episode cover

TechSupport: Nothing’s Private – Even Conversations with ChatGPT

Aug 13, 202522 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

How did around 100,000 ChatGPT conversations end up indexed on Google? Users who sent conversations with a now-defunct “share” feature in ChatGPT made their conversations public — often without realizing they were exposing them to the open web. 404 Media’s Joseph Cox joins Oz and Karah to unpack how it happened, what kinds of chats were revealed, and why everyone should care about this privacy lapse. They discuss why OpenAI may have underestimated the privacy risks, how archived conversations could still be misused, and why both everyday users and major corporations need to rethink what they feed into AI tools.

See omnystudio.com/listener for privacy information.

Transcript

Speaker 1

Welcome to tech stuff. This is tech Support. I'm os Valoshin and I'm here with Cara Price.

Speaker 2

Hey us, Hey Karra.

Speaker 1

So today we wanted to talk about this chatchypt feature, which is now defunct, but our friends at four or Form Media had a story with the headline nearly one hundred thousand chatchypt conversations were searchable on Google. And as soon as that email hit my in box, before I'd even read it, I've forwarded it to you and to our producer Eliza, and I said, let's jump on this.

Speaker 3

Yeah. You know, part of it is that it taps into this fear that we all have about our most intimate thoughts being made public. This isn't like having a private Instagram account. This is very much between us and chat gpt. It's a little bit like talking in our sleep. And I think most people who have played around with a chatbot have some questions or responses that they'd rather the general public be blind to. I know I have my fair share.

Speaker 2

Yeah.

Speaker 1

We did that piece recently with Kashmir Hill about AI induced psychosis and the guy who'd fallen into the rabbit hole by talking with chat Gibt about whether or not he might be living in a simulation. So I started talking about chat gpt with this to see if I would also be taking down the rabbit hole, and then I was like, oh my god, I'm not sure if I want this to be made public at a later date.

So yeah, open Ai says they're now working with Google to scrape these conversations off the web, but of course some quick thinkers have already archived them.

Speaker 2

And I can't help but be rather.

Speaker 1

Curious about what it is that people are talking to chat Gibt about.

Speaker 3

I mean, obviously, we do have a segment at the end of every Friday episode called Chat and Me about how our listeners are really using their chatbots, and now we have hundreds of thousands of additional responses to explore.

Speaker 1

Of course, there's a difference between how our listeners tell us they're using chambots and the reality which apparent from these logs, and one researcher was actually created a data set of all the responses that were indexed by Google, and again our friends at four or four Media were able to take a look here to tell us about what everyone's asking chat is.

Speaker 2

Four or four Media's Joseph.

Speaker 3

Cox Joseph, Welcome back to tech stuff.

Speaker 4

Hi, thank you for having me.

Speaker 2

Joseph.

Speaker 1

Let's start at the beginning. How is it that one hundred thousand chat GPT conversations ended up on Google Search. I thought that these conversations were private.

Speaker 4

Yeah. So this starts with an article on Fast Company on July thirtieth, and that outlook found that chat GPT conversations were being indexed by Google. That is, as your listeners will know, Google is constantly going around the web and essentially grabbing content from websites. Of course, it can

use it to make its search engine. What was different here was that while ordinarily, when you're talking to chat gpt, thankfully all of the content of that conversation is private, in this case, what some people have been doing was using i think a little known feature where they could share the contents of that communication. Now, maybe you want to do that because you want to show your friend, wow, look at this really wacky, crazy thing that chat GPT

told me. Or maybe there's a business need right like, hey, I've done this with chat GPT, now I need to show other people in my team. And you would select the share feature and this would create a public essentially a public web page version of that chat, and although you can then send that to your friends or your co workers, it can also be seen by Google obviously, and OpenAI probably could have done some stuff to protect

it there. But the result is that a bunch of these conversations and now publicly available, are indexed by Google, and I seriously doubt that all of the people using this share feature really understood what they were getting into.

Speaker 1

Yeah, can you elaborate on that, because I thinking about WhatsApp, for example, where there's like a forward button, or like on x, I can do like a share link to tweet. Is this like a somebody thinks they're pressing a button to share an individual version of the transcript with another person, but in so doing is kind of making their whole chat GPT history visible to Google. Or what's the practical explanation of how this happened?

Speaker 4

Yeah, the users are making that particular conversation publicly available, and it works in a very similar way to the things you just outlined. I sometimes compare it a little bit to a Google doc link where you will go and you'll make that public and there's that setting you can do that says Hey, anybody with this link is going to be able to read your aw full article draft. I mean that would be my case or whatever, or

your private thoughts or whatever. But you don't then go and paste that link online and Google take steps so that's not included in search engine results. Of course, if you want to post it on a forum or you post it on Twitter, that's going to be something else. But that's usually how I think most people expect this sort of sharing behavior to work. They expect that, well, I'm going to just share it with one or two

people or you know, a dozen or whatever. They don't expect typically that it's going to be available to anyone on the Internet who knows where to look, or of course anyone with Google now because Google has archived it as well. It's sort of a big mix of the user is partly at fault for perhaps not fully understanding what is going on. Of course open AI, maybe not fully explaining what is going on, and not taking steps to stop Google indexing, and then of course Google indexing

it as well. There's a lot of maybe blame is too strong a word, there's love blame to go around, I think, to all parties.

Speaker 2

So this is one hundred thousand conversations.

Speaker 1

Do we know how many users those hundred thousand conversations represent? And also you know what are some of the things in those conversations.

Speaker 4

Yeah, I don't think I've seen figures that drill down to how many users, but you're right, it's nearly one hundred thousand conversations with this data set the researcher scraped from Google. I mean, before this, some researchers were going through hundreds of conversations and that was already bad enough, and of course Newsworthy. Well, this researcher did was scrape them on mass put them into a data set. And I'm actually looking at it now and there's a lot

of benign stuff in here. It looks like somebody is making their first iPhone app and they're using chat GPT for that. There are others where people are clearly discussing sensitive business materials, such as could you help me write this contract? There is potentially, you know, some bank information in here. I say potentially because it sure looks like

bank information. And then you have I mean you mentioned at the top these sort of delusional conversation that some people have with chatjeput and I'm sure there is some of that in here. I have seen some people talking about therapy. I have seen some people talking about relationship issues, such as one it seems to be a man talking about his ex girlfriend and wondering why she's not looking at his Instagram stories, that sort of thing, which I don't know if I would turn.

Speaker 2

It's just not that into you.

Speaker 4

That means yes, I think chat GPT was trying to say that basically, so this is only what people have decided to share, which is a very interesting caveat to the data.

Speaker 1

They don't want to share it with the world, but they've chosen at least one other person to share it with, so therefore, by definition, is not their most private use case.

Speaker 4

Yes, and maybe the research or others will be able to do some sort of deeper analysis on this than me. But that's interesting and that what are the sorts of things that people are willing to share with another person? And of course, you know, what does that tell us about the things they're not sharing. That being said, I don't think anybody wants a security issue where we're actually able to see all of that private data either.

Speaker 3

So this was something that was reported out a few weeks ago, As you said, has there been any change and how did open ai respond to the exclusive.

Speaker 4

So open ai has now disabled this like opt in sharing feature because the company actually said they don't think people fully understood what was going on. And then the company also says it is working with Google to remove some of those indexed results. Because of course there's a few things going on here. There's the exposure in the first place, there's the sharing, there's the indexing by Google.

But even if Google does remove these search results, these chats have been archived by this researcher, and I presume others as well, Like I seriously doubt there's only one or two people who grabbed all of this data. It's very much an interesting privacy issue that I think researchers want to look into and learn from.

Speaker 3

I don't understand why open ai seem to think that this tool would be useful, Like, have you given that any thought?

Speaker 4

Yeah, I think that people do want to sometimes share the interesting or crazy or insightful stuff they get from GPT. Now, open ai probably should have taken steps to ensure that people can share this in a much more private manner, maybe something like you have to add a particular chat GPT user to the conversation, then they can see it in the same way you add somebody to a Google doc, for example. That would be a little bit more laborious,

there'd be a bit more friction there. But I'm just interested in why open ai did not take more steps to protect this from being scraped by Google. It is possible to share material online without it being touched by search engines. You can ask search engines, hey, if you come across this, please do not index it. I'm curious why OpenAI did not take those steps, and I don't

have any insight either way. But the result is that all of these chats have now been indexed on Google, and I think that's pretty significant.

Speaker 2

What do you think might happen next?

Speaker 4

What happens next is that I think other companies are going to start checking whether they also have similar issues like this. And I do want to stress like, this is not the vast majority of chat GPT conversations or anything like that. Chat GPT was not hacked, it wasn't breached. There was a somewhat niche security issue, but because these tools are becoming so so popular now, even a relatively niche issue can actually impact a ton of people.

Speaker 3

After the break, So how secure are AI chatbots stay with us?

Speaker 1

It's interesting because Sam Altman was recently on THEO Vonn's podcast and he was sort of pointing out some of the risks to my surprise, about the privacy issues in chat shept. He was saying, like therapists conversations are protected by hippa lawyer conversations are protected by attorney client privilege, and people assume that when they're talking with chat that maybe some of these protections apply, whereas in fact they don't. And I was kind of wondering why he, of all people,

was out there on this topic. I did read some other reporting saying that it may be part of the lawsuit with the New York Times. The New York Times is part of their discovery in the lawsuit against open Ai for copyright infringement. Are demanding I think one hundred million open ai converse stations for analysis, But I was

surprised to hear Altman out there on this. Nonetheless, can you kind of take a step back and maybe reflect on this story about the breach in the broader context of how people are using chatbots and what chatbot makers are incentivized to do or not do to protect their users.

Speaker 4

Yeah, so I haven't seen those comments. But to zoom out a little bit, Altman and other people in the space, they enjoy kind of getting their cake and eating it too, where on one side they will warn about the dangers of AI. They'll say it needs to be regulated, it needs to be taken really very seriously, and also it is coming and there's nothing we can do about it, while also building those tools at the same time and

making a lot of money from it. They actually benefit from being on both sides of the conversation at the same time, and Oltman and others very easily switch between those positions depending on the context and which they're talking about. So of course, you know, an AI developer can say very very sensitive stuff is going on here and people need to be careful, and then on the other side they'll say, while our technology is absolutely suitable for that

because we take privacy very seriously or whatever. I've just kind of got a little bit jaded by all of these companies playing both sides at the same time, And that's why I think you need outside journalists, outside experts, policymakers, activists who can probe it a little bit more because every time I hear Oltmann or someone similar make these points about their own technology, I have to remember, yeah, but they're making it.

Speaker 2

Yeah.

Speaker 3

Open ai is apparently trying to remove the shared content from search engines, but smart people like this researcher accessed and stored it while it was live. While they're using it for an altruistic purpose. I'm wondering if you think people should be concerned, like what if they do end up in the wrong hands.

Speaker 4

I don't think people need to necessarily be concerned about this specific breach. I mean that being said, maybe there's something really really bad in there and I simply haven't seen it, and the researcher and others are going to continue to dig through it. But people should absolutely be careful with how they are using chatbots. I mean, maybe they use this now disabled feature and maybe they're going

to be concerned about that. But putting that aside, you have to remember every single command, every single prompt, every single sentence that you put into chatch, GPT or any of these other ones. It is going somewhere. It's not just sat on your computer. It's not being locally processed. Is going off to their systems, and ultimately you don't

really know what it's being used for. That is, maybe it's you retraining and improving the training of the system itself, or whether there's some sort of quirk in its security or privacy or sharing settings that ends up with it now being publicly available. And I know that I'm a little bit more extreme than others, but I would never

put sensitive information into one of these things. And I know that plenty of companies are having to implement policies where they tell employees, please do not put competential information into the chatbot that we don't own. I think people just have to be really, really cognizant of that. In the same way that when we all first got smartphones, we had to learn, oh, right, it's tracking my location

data if I turn location data on. I think we need to remember and to learn, oh, when I put this thing into chat GPT, I don't know exactly where it's going, and it could potentially bite me later if I'm not careful.

Speaker 2

Yeah, And I think it's an important point.

Speaker 1

Just we think about the stakes of the you know, open AI or chatchbt logs being indexed and available on Google because like information that you know, you share with a chatbot that you may think is more or less harmless, could have you know, identifying information or sensitive personal information about addresses or accouncilor whatever it may be.

Speaker 2

And so I think there's this kind of almost.

Speaker 1

Willful ignorance which many of us, including me, persist with despite knowing better in terms of how important proper security practices around digital information are. And as you say, like with all of a sudden standing on the doorstep of a much more scary reality.

Speaker 4

Yeah, I would say that with security you really have to be proactive rather than reactive after something has happened, you know, your bank account got broken into or anything like that. Sure, you can deal with it, but it's going to be annoying, it's going to be hard, it's going to be tricky, and maybe some people steal some money from you, maybe somebody hacks into your company or something like that. You really should do security proactively if

you can. And a really thing that applies to everybody, which isn't to say that it should be on users all of the time. It really is up to the people who make these products such as chat, GPT by open Ai or whatever else for them to put in these guardrails so people can't make these mistakes in the first place.

Speaker 3

You were lucky enough to get a hold of this data set by this researcher. Do you know what the researcher is planning to do with the information.

Speaker 4

Not specifically beyond analyzing it for trends. I believe seeing what is in there absolutely no criminal activity or anything like that. But again, that's not to say that other people may not be doing that as well. I can imagine the situation which let's say, and this is a hypothetical, but I'm sure I can find something that would reflect this in some sort of data set. They're say you were using Chatchuputi or something similar to make a quick

prototype app for your company. In that you include your username and password and access keys for the infrastructure of your company to make that app. It's all well and good, it works, and it accidentally gets shared in a database like this, Someone who is malicious could then go in, well, thank you very much for those access keys. I'm now

going to break into XYZ company. And although we haven't seen that happen specifically with this data set, that sort of stuff happens constantly where you know, an engineer company, even a very junior one, will put those keys in code which is accidentally exposed online. It's accidentally publicly available, and that's how we end up with data breaches.

Speaker 1

Now, yeah, I mean as AI is being marketed as a tool for work, obviously, the leverage like an individual consumer has versus Open Ai or Google is really limited, right, Like you know, I can complain and holler and post on Reddit, and journalists like you can pick it up.

But when you know, PEPSI or Ernst and Young has concerns about how its employees chats are being handled by third party companies that perhaps you know, can can drive change more rapidly, given these are like big corporate spenders. So I'm curious do you know anything about what the conversation alike but kind of B to B conversations around operational security for NLMs, Well, I.

Speaker 4

Mean I would also draw a parallel even just with the intellectual property one, where a lot of these companies weren't really paying attention until somebody was taking Mickey Mouse doing some very strange things with AI with it for example. And now of course we have the lawsuit you know between Disney and mid Journey, for example, which is an

AI image generator engine. When it comes to security, I don't know about the specific conversations, but it's absolutely something that people need to be educated at inside their companies. Funny enough about Disney, there was a breach of Disney I think a year ago at this point, and that started because one of their employees downloaded the piece of software that they believed was some sort of AI agent

or some sort of AI generation tool. Hidden inside that was malware which then stole passwords, and which then logged into Disney's slack and stole a mountain of data. And it turns out the hacker behind this had been deliberately putting malware into their own custom AI tools to try to get unsuspecting people to download it. So this is a real threare to anybody working I think in any sort of company. Hackers do not care really who you are.

They only care what you may or may not have access to, and AI is just another consideration of that, whether that's the data that an employee is inversely putting into chat, GPT or a sketchy tool that someone may download. You know, like, this is something that we have to live with now.

Speaker 2

Joseph, thank you, Thank you, Joseph, thank you so much.

Speaker 3

For Tech Stuff.

Speaker 1

I'm care and I'm os Valoshin. This episode was produced by Eliza Dennis and Tyler Hill. It was executive produced by me Karroen Price and Kate Osborne for Kaleidoscope and Katrin norvelfa I Heart Podcasts. Jack Insley mixed this episode and Kyle Murdoch rodel theme song.

Speaker 3

Join us on Friday for the weekend tech Ars and I will run through the tech headlines you may have missed.

Speaker 1

And please do rate and review the show wherever you listen to your podcasts, and also send us a note at tech Stuff podcast at gmail dot com with any comments or suggestions

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android