¶ Intro: Meet Nick Turley and Mark Chen
Hello, I'm Andrew Main, and this is the OpenAI Podcast. My guests today are Mark Chen, who is the Chief Research Officer at OpenAI, and Nick Turley, who is the Head of ChatGPT. We're going to be talking about the early viral days of ChatGPT. We're going to talk about ImageGen, how OpenAI looks at code and tools like Codex.
what kind of skills they think that we might need for the future, and we're going to find out how ChatGPT got its totally normal name. Even half of research doesn't know what those three letters stand for. You know, you're going to have an intelligence in your pocket. It can be your tutor, it can be your advisor, it can be your software engineer. There's a real decision the night before. Do we actually launch this thing? First off, how did OpenAI decide on that awesome name?
¶ Origin of the name "ChatGPT"
I was going to be chat with GPT 3.5 and we had a late night decision to simplify. Wait, wait. Say that name again. I was going to be chat with GPT 3.5, which rolls off the tongue even more nicely. that's uh and you said that was a late night decision meaning like weeks before you finally decided what to call it right right right no weeks before we hadn't started on the project oh goodness but you know i think we we realized that that would be hard to pronounce and um
Came up with a great name instead. So that was the night before. Roughly. Might have been the day before. It was all kind of a blur at that point. I would imagine a lot of that was a blur. And I remember here. I remember being in a meeting when we talked about the low key research preview, which like.
Really was like, we really thought like, oh, this is because it's, it was 3.5. 3.5 was a model that had been out for months. And from a capabilities point of view, when you just look at the evals, you're like, yeah, it's the same thing, but we just put the interface in here and made it so you didn't have to prompt.
as much. And then ChatGPT comes out. And when was the first sign that this thing was blowing up? I'm curious. Everyone has their slightly own... recollection of that that era because it was a very confusing time but for me day one was sort of you know is the dashboard broken classic like uh the logging can't be right day two was like oh weird
i guess like japanese reddit users discovered this thing maybe it's like a local phenomenon day three was like okay it's going viral but it's definitely going to die off and then by day four you're like okay you know it's gonna gonna change the world
mark did you have any expectation about that about no honestly i mean we've had so many launches so many previews over time and yeah this one really was something else right the takeoff ramp was huge and yeah my parents just stopped asking me to go work for google Wait, so wait, wait, wait a second. Up until ChatGPT, your parents were asking like what you're doing here? Yeah, no, I mean, they just never heard of OpenAI.
I think for many years thought AGI was this pie in the sky thing and I wasn't having a serious job. So it was a real revelation for them. Yeah. What was your job title at the time? um i think just member of technical staff member technical staff yeah and then then that blow up and now you're head of research i guess so yeah i think so all right yeah actually on the gpt name um i think
Even half of research doesn't know what those three letters stand for. It's kind of funny. Half of them think it's generative pre-training. Half of them think it's generative pre-trained transformer. And what is it? It's the latter. Okay. All right. Yeah, those people, they don't know the name of it. Yeah. It's weird how just a silly name like that all of a sudden becomes a thing. Would you see that with like, you know-
Google, Yahoo, Kleenex, things like that, Xerox. And sometimes they were, some of those were names by intention, and this was really just a silly sort of name. For me, the moment- I felt like after watching the launch, watching it accelerate, I knew it was going to happen. And then when it did, it was when it was on South Park. And remember that when South Park made fun of the name.
¶ ChatGPT's viral takeoff
That was the first time I'd watched South Park in, let's just say, a while. And that episode, I still think it's magic. It was obviously profound to watch and see, you know, something you helped make show up in pop culture. But there's the punchline in the end where it's like, oh, this was co-written by Chachi B2. I think they took that off though. I think in later episodes, because it used to say, I think written by like.
Trey Parker and ChatGPT. And then, no, it was. And then I think later, I think they may have pulled that off at some point. I don't remember. Well, I strongly feel that you shouldn't have to give credit to it. Yeah. I had to give credit to ChatGPT for every aspect of my life. well might as well just say chad gpt maybe with andrew so it's use it for prep for your interviews
You know, one of my co-producers, Justin, probably uses it. I haven't asked him yet because I'd like to think that he's handcrafting every single question that we're thinking about here, but I am sure. You say it was a bit of a blur. I'll tell you like a standout moment for me at the launch of ChadGPT. was, I don't know if you remember this, but the Christmas party. And we'd had several weeks of ChatGPT out there.
And Sam Altman went up and said, hey, it's been exciting to watch this, but the internet being the internet, and I think we all felt this way, it's going to die down. Spoiler alert, it did not die down.
and it just kept accelerating what were the things you had to do internally to sort of keep this thing up and running as more people wanted to use it we had you know quite a few constraints and if for those the few who remember you know i think you guys remember touch but he was down all the time yeah in the beginning um and that was yeah
we'd said hey there's a research preview no guarantees and maybe it goes down but the minute you had people loving and using this thing that didn't feel super good so you know people are certainly working around the clock to keep the site up i remember you know we obviously ran out of gpus we ran out of database connections we had you know um we're getting rate limited in some of our providers nothing was really set up to run a product so in the beginning we
just built this thing we called it the fail whale and it would just tell you kind of nicely that the thing was down and made a little poem i think was generated by gpd3 about being down and it was sort of tongue-in-cheek That got us through the winter break because we did want people to have some sort of a holiday. And then when we came back, we were like, okay, this is clearly not viable. You can't just go down all the time. And eventually we got to something we could serve everyone.
Yeah, and I think the demand really speaks to the generality of ChatGPT, right? We had this thesis that ChatGPT embodied what we wanted in AGI just because it was so general. And I think you're seeing that demand ramp just because people are... realizing you know any use case that i want to to give or to throw to the model it can handle we were kind of known as the company working on agi and i think prior to chat gpt
The API was certainly the first time we had a public offering where people could go use it and do it, but then it was more for developers and stuff. And I think that as long as people were sort of thinking AGI, that seemed to be the point at which people thought these models would be useful. But we saw GPT-3. We saw that that was useful. And then we saw that we can do other things that were useful.
¶ Internal debate before launch
Was everybody at OpenAI on board with ChatGPT being useful or being ready to launch? Yeah, I don't think so. You know, even the night before, I mean, there's this very famous story at OpenAI of Ilya taking... 10 cracks at the model, 10 tough questions. And my recollection is maybe only on five of them, he got answers that he thought were acceptable. And so there's a real decision the night before. Do we actually launch this thing?
the world actually going to respond to this and i think it just speaks to when you build these models in-house uh you so rapidly adapt to the capabilities and it's hard for you to kind of put yourself in the shoes of someone who hasn't kind of been in this model training loop and see that there is real magic there. Yeah. Yeah. Yeah. I think to build on that, like the Condor AC internally about, you know, is this thing good enough to launch? I think.
It's humbling, right? Because it's just a reminder of how wrong we all are when it comes to AI. It's why frequent contact with reality is so important. Could you elaborate more on that contact with reality? What does that mean? Yeah, I mean, when you think about iterative deployment, one way I like to frame it is, you know, there's no point everyone agrees where it's suddenly useful. And I think usefulness is this big spectrum.
And so, you know, there's not one capability level or one bar that you meet and suddenly, you know, the model is useful for everyone. Were there any hard decisions about what to include or what to focus on? We were very, very principled on ChatGPT. not balloon the scope. We were adamant to get feedback and data as quickly as we could. I'm always in Slack telling you things, by the way. I'm like, Nick, add this, add this.
I remember actually there was a lot of controversy about the UI side. For example, we didn't launch with history, even though we thought we would probably want that. And guess what? That was the first request. I also think there's always the question, can we train an even better model with two weeks more time?
I'm glad we didn't because, you know, we, I think got a ton of feedback as, as we did. So yeah, there was a ton of those scope discussions and, you know, the holidays were coming up. So I think we had this kind of natural forcing function for getting something out. Yeah. There's this, this habit of things that.
If it's going to come after a certain point in November, it's not going to come out until like February. There's a sort of window where things would fall on either side. Well, that would be the classic mindset in a big tech company. I think we're definitely a bit more flexible than we should. I felt like one of the big impacts was...
Once people are out using it, it felt like the rate of these things improving was tremendous. I don't know if that was something that we really had in the calculus. We could certainly think about training on larger site more. data, scaling compute, but then the idea of actually having the signal you would get from that many people using it. Yeah, I think over time, feedback really has become an integral part of how we build the product. And it's also become an integral part of safety.
¶ Evolution of OpenAI's launch approach
and so you always feel the time cost of losing out on feedback you know you can deliberate in a vacuum right uh are they going to respond to this better are they going to respond to that better um but it's just not a substitute for just bringing it out there right um i think our philosophy is
let the models have contact with the world. And if you need to revert something, that's fine. But I think there's really no substitute for this fast feedback. And it's become one of the big levers for how we improve model performance too. It's sort of funny.
i feel like we started with shipping these models in a way that is more similar to hardware where you make like one launch very rarely and it has to be right and you know you're not going to update the thing and then you're going to work on the next big project and it's capital intensive and the timelines are long
and over time and i think chat gpt was kind of the beginning it's looked more like software to me where you make these frequent updates um you have kind of a constant pace the world can adopt something doesn't work you roll it back
and you sort of lower the stakes in doing that and you lower you increase the empiricism and and of course just operationally too you can innovate faster and in a way that is more and more in touch with what users want yeah one of the examples we had of that was The the model becoming.
too obsequious or sycophantic. Could you explain what happened there where that was where people all of a sudden say, hey, it's telling me I've got 190 IQ and I'm the most handsome person in the world, which I had no problem with personally, but other people did. And what was going on?
¶ The sycophancy incident and RLHF
there yeah so i think um one important thing is we rely on user feedback to improve the models right and it's this very complicated mix of reward models which we use in a procedure we call RLHF, right? Using human feedback to use RL to improve the models. Can you give me just like a brief example of what that would mean? Yeah, yeah. So I think one way to think about it is, you know, when a user... enjoys a conversation you know they provide some positive signal um
Yeah, a thumbs up, for instance. And we trained the model to prefer to respond in a way that would elicit more thumbs up, right? And this may be obvious in retrospect, but stuff like that, if balanced incorrectly, can lead to the model. being more sycophantic right um you can imagine users might want that kind of uh that feeling of you know a model saying good things about them but um i don't think it's a very good long-term outcome and actually when we look at
kind of our response to sycophancy and the rollout that resulted there. I think there were a lot of good points about it. You know, this was something that was flagged just by a small fraction of our power users. It wasn't something that a lot of people... who generally use the models noticed and i think we really picked that out fairly early we responded to it i think with the appropriate level of gravity and um yeah i think it just shows that you know we really do take these issues
quite seriously, and we want to intercept them very early. Yeah, it felt like there was maybe 48 hours since the model came out, and then Joanne Zhang had a response explaining exactly what happened. And I think that that's the... That's the hard part. How do you navigate that? Because the problem with social media is you're basically monetized by engagement time. You want to keep people on there longer so you can show them more ads. And certainly...
The more people use ChatGPD, obviously there's a cost to open ideas. Maybe use it once and stay around forever, but that's not practical. How do you weigh that? The idea of making people happy with what they're getting versus making the model, you know.
be broadly more useful than just pleasing. I feel very lucky in this regard because we have a product that's very utilitarian. People use it to either achieve things that they do know how to do but don't feel like doing faster or with less effort. or they're using it to do things that they couldn't do at all. First example is maybe writing an email that you've been dreading. Second example might be running a data analysis that you didn't actually know how to do in Excel. True story.
So, you know, those are very utilitarian things. And fundamentally, as you improve, you actually spend less time on the product, right? Because, you know, ideally it takes less turns back and forth, or maybe you actually delegate to the EI so you're not in the product at all.
So for us, you know, time spent, it's very much not the thing we optimize for. You know, we do care about your long-term retention because we do think that's a sign of value. If you're coming back three months later, that's clearly means we did something right. But what that means is, you know, I always say, show me the incentive and I'll share the outcome. We have, I think, the right fundamental incentives to build something great. That doesn't mean we'll always get it right. The sycophancy.
events were really, really important and good learning for us. And I'm proud of how we acted on it. But fundamentally, I think we have the right setup to build something awesome. So that brings up the challenge. I wonder how you navigate that is that. One of the things early on when, you know, Chad Chibita came out, there was like the, the allegations it's woke, it's woke. And people are trying to promote some sort of like agenda from.
¶ Balancing usefulness vs. neutrality in model behavior
My argument always been like, you train a model on kind of on corporate speak, average news and a lot of academia, that's going to kind of follow into that. And I remember Elon Musk was very critical about it. And then when he trained the first version of Grok.
it did the same thing and then he's like oh yeah when you trained it on this sort of thing and did that and internally at open eye there were discussions about how do we make the model not try to push you not try to steer you could you go a little bit how you try to make that work yeah so i think um
At its core, it's a measurement problem, right? And I think it's actually bad to downplay these kind of concerns because they are very important things, right? And we need to make sure that the model, the default behavior that you get is something that's centered, that... you know, doesn't reflect bias on the political spectrum or in many other, you know, axes of bias. And at the same time, you know, you do want to allow the user the capability to, you know, if you wanted to talk to.
a reflection of something with more conservative values to be able to steer that a little bit right um or liberal values right and and so i think the thing is you want to make sure that defaults are meaningful and they're centered and that's a measurement problem and you also want to give abilities some flexibility right within bounds to steer the model to be a persona that you wanted to talk to i think that's right um i think you know in addition to
neutral default ability to sort of bring your own values to some extent i think you know being transparent about the whole thing is i think really really important i'm not a fan of you know secret system messages that you know try to like you know hack the model into saying or not saying something um what we've tried to do is publish our spec so you can go look at you know if you're getting certain model behavior is that a bug um you know is it a violation of our own stated spec
Or is it actually in the spec, in which case you know who to criticize and who to yell at? Or is it just underspecified in the spec, in which case that allows us to improve it and add more specificity into that document? So by sort of publishing the rules of the AI that it's supposed to be following. I think that's an important step to have more people contribute to the conversation than just the people inside of OpenAI.
So we're talking about the system prompt, the part of the instruction that the model gets before the user puts the input. Well, I think it's more than that. Yeah. System prompt is one way to steer the model, but it goes much deeper into that, right? Yeah, we have a very large document.
that outlines across a bunch of different behavior categories, how we expect the model to behave. And just to give you an example here, right? You can imagine if there's someone who comes in with just like a incorrect. belief just a factually incorrect kind of a point of view um how should the model interact with that user right and should it reject that point of view outright or should it collaborate with the user on kind of figuring out what's what's true together
And we take that latter point of view. And I think there are a lot of very subtle decisions like this, which we put a lot of time in. Yeah, that's a hard one because I think-
Some things you can test for and you can try to figure out how to advance, but when you're trying to figure out how an entire culture is going to adopt something that's challenging, like if I was someone who was convinced that the world was flat, how much should the model push back against me? And some people are like, oh, it should push it back all.
the way but it's okay what if you're one religion or not another and yeah it turns out rational people and well many people can disagree on how you know the model should behave in these instances And you're not always going to get it right, but you can be transparent about what approach we took. You can allow users to customize it. And I think, you know, this is our approach. I'm sure there's.
you know ways we can improve on it but i think that being transparent in the open about how we're trying to tackle it we can we can get feedback How are you thinking about as people start to use these models more and more, regardless of whether or not that's some dial you're trying to turn, it's just the more useful it becomes, the more people want to use it. There was a time when nobody wanted a cell phone and now.
We can't get away from them. And how are you thinking about relationships people are forming with their systems? Obviously, I mentioned this earlier. This is a technology you have to study. It's not designed in a static way. to do X, Y, Z it's, it's highly empirical. So, you know, as people adopt and the way that they use the product, it's something that we, we need to go understand and, and, and, and act on as well. I've been.
um observing this trend with interest where i think you know increasing number of people especially gen z and you know younger populations are coming to chat as a thought partner and i think in many cases that's really helpful and beneficial because you've got
someone to brainstorm on a relationship question you've got someone to brainstorm on a you know um a professional question um um or something else but in some cases it can be harmful as well and i think detecting those scenarios and first and foremost having the right model behavior
is very very important to us um so actively monitoring and in some ways it's one of those problems we're going to have to grapple with because with any technology that becomes ubiquitous it's going to be dual use people are going to use it for all this awesome stuff and people are going to use it in ways that you know we wish they didn't
And we have some responsibility to make sure that we handle that with the appropriate gravity. I find myself having longer conversations with it. I like the memory function. I like the fact you can turn it off if you don't want. And I think about like...
¶ Memory and the future of personalization
you know what's this going to be two years from now three years from now when it has a much longer memory much more context with this i like the idea to have these sort of like you know memento anonymous modes too or it's not going to store this but I kind of wonder how much you've been thinking about two years, three years down the road. What's that going to be like when ChatGPD knows way more about you?
Yeah, I mean, I think memory is just such a powerful feature. In fact, it's one of the most requested features when we talk to people externally. It's like this is the thing I really want to pay. Pay more for it. And I think, you know, you liken it to if you've ever kind of had a personal assistant, you know, you. No, I'm not. Well, you do need to build up. Not relatable. Sorry, guys. I'm sorry, guys. But, you know, it's yeah, it's just like it's.
kind of in any kind of relationship that you have with a person right you you build up contacts with them over time um and i think just the more they know about you right the richer the relationship the more you know um It can also help you, right? You can work together to collaborate on tasks together. I do become self-conscious of the fact that it knows everything about me when I'm grumpy. And I've argued with it recently, by the way. That's good.
You should be able to argue with it. You understand a lot about yourself and having a thing to argue with. And I think you spare others of that experience, which can also be beneficial. Don't argue on math and science. You're not going to win those. Increasingly very unlikely.
Yeah, I think memory is cool. To Mark's point, it's been part of our vision for a long time because we said we were going to build a super assistant before we really knew what that meant. ChatGPT was sort of the early demonstration to that idea.
but if you kind of think about you know real world intelligences you know even they are not particularly useful on their first day um and i think being able to solve that problem or begin to solve that problem has been profound to your earlier question though you know
It really does feel like if you fast forward a year or two, ChatGPT or things like it are going to be your most valuable account by far. It's going to know so much about you. And that's why I think giving people ways to talk with this thing in private.
is very important um we make you know this like temp chat thing very it's like literally on the home screen because we think it's you know increasingly important to talk about stuff sort of off the record too so um it's an interesting question and uh um i think privacy and ai is going to be be an interesting one um for the next coming years i want to switch gears talk about another release which again kind of caught people by surprise and blow up was image gen and uh I was here for...
Dolly, Dolly 2, and then Dolly 3 came out. And I thought Dolly 3, I thought was a very capable model, but it seemed like it preferred a certain kind of image and a lot of the utility and the capabilities for variable binding was sort of kind of hit.
¶ ImageGen's breakthrough moment
hidden away. And then ImageGen was kind of just this breakthrough moment that it caught me off guard. How did you guys feel about the launch of that?
Yeah, honestly, it caught me off guard, too. And this really props to the research team. Gabe, in particular, did a ton of work here. Kenji, many others on the team did phenomenal work. And I think it really spoke to this thesis that when you... get a model which is good enough that in one shot it can generate an image that fits your prompt that's going to create an immense value and i think we never quite had that before right um that you just get the perfect generation oftentimes on the first
first try um and i think that's something very powerful you know like uh people don't want to pick the best out of a grid i think uh yeah you just got very good prompt following and you know this great style transfer too right yeah this ability to kind of put images as context for the models to modify and to change and the fidelity that you could do that with. I think that was really powerful for people. I think this image and experience.
um it was just kind of another mini chat gpt moment um all over again where you know you have kind of this you've been staring at this for a while you're like yeah it's gonna be cool i think people are gonna like it um but you kind of you know you're launching like 20 different things and then suddenly
the world is going crazy in a way that you, you kind of only find out, um, um, by shipping. Like I remember distinctly, you know, we had like 5% of the Indian internet population try, um, image gen over the weekend. And I was like, wow, we're reaching new types of users who we wouldn't even have thought, you know, who might not have thought of using ChatGPT. That's really cool. And to Mark's point, I think a lot is because.
There's this discontinuity where something suddenly works so well and truly the way you expected, where I think it blows people's minds, you know, and I think we're going to have those moments and other modalities too, you know, I think voice.
you know it hasn't quite passed the touring test yet but i think the minute it does people are gonna um i think find that immensely powerful and valuable you know the video is going to have its own moment where it starts meeting the expectations that users have So I'm really excited about the future because I think there's so many of these magical moments coming that are really going to transform people's lives and also change sort of...
chat gpt's relevance for people because um you know there's i've always felt like there's text people and there's image people and like some of them are a little bit different um and now they're all using the product and discovering the value um across the board the moment when it launched i think it kind of illustrated the The problem had been with image models before and.
you know when dolly came out it was super exciting because you're like i'm like doing pictures of space monkeys and all these sorts of things the moment you try to do a really complex image and that's the phrase i brought up before which is variable binding you start to see these things drop off and That was when I realized, oh, there's going to be a challenge for other image systems that don't have kind of the scale and the compute of like a GPT-4 under the hood.
Now, was it basically that, like taking a GPT-4 scale model and say, now you do images that made the breakthrough? Well, I think there are a lot of different parts of research that made this such a big success, right? I think... with a complicated multi-step pipeline, it's never just one thing, right? It's like very good post-training. It's very good training. And I think it's just all of that coming together, right?
Variable binding definitely was one thing that we paid a lot of attention to. I think one thing about the ImageShare launch is a launch that was very deep. I think people, you know, they started by working on, you know, creating anime versions of themselves.
you realize when you play with it more you know the infographics they work oh yeah like you actually create charts you comic book panels yeah you can mock up what your home would look like exactly furniture in it i've heard all these things from from users that are like completely surprising about the way these
We did the podcast setup by literally taking some photos of chairs in the room and just putting it in there and saying, create a better setup. And it was amazing. So we've seen kind of a lot of the- There was a lot of the anime style images, which kind of like for summary, it was just sort of the weird thing where it was just better than what we'd seen before. And I don't think anybody is ready to be really surprised by an image model in that way. I think obviously internally and externally.
What were some of the things that surprised you or some of the new things you saw people doing? Yeah, I'll tell you a quick story there too, because up until the day of the launch, we're trying to figure out what's the right use case to showcase. And I think...
I'm so glad we ended up on kind of anime styling. It's just everyone looks good as an animated character. That's true. I mean, it's funny. With the original Chachubiti, I thought it would be a strictly utilitarian product, and then I'm surprised that people use it for fun.
In this case, it was sort of the opposite where I was like, okay, this is going to be really cool for memes. People are going to like have fun with this thing. But then I was like really surprised by all the genuinely useful ways of using image gen, whether or not it's planning your home project, as I mentioned earlier.
um you know of uh um you're doing construction you want to see what things would look like if you know you had this remodel or this furniture or whatever um to um you're working on a slide deck um for this important um presentation and you just want to have really useful consistent illustrations um that are on topic and um and and get it so so i really have been been kind of
personally surprised by the utility in this case because i knew it would be fun that was not a question yeah i think i used it to generate a tier list of ai companies and it put it opening at the top you win model um what good post training Yeah, it just happened. Who knew? What has been-
The thinking and it's changed because I remember originally with Dolly, the idea of like, okay, we have to be a lot of very controlled about what it can do, what it can't do. Originally, I remember when we first launched, you couldn't do people, which was. Not a very useful model. And then finally was trying to roll back how much of that was cultural shift, how much that was a technological ability to control for things, and how much of that was just saying we've got to push the norms.
¶ Cultural shifts in safety and the freedom to explore
I would say it was both cultural shift and an improvement in our ability to control things. The cultural shift, you know, I'm not going to deny it. I think when I joined OpenAI, there was... um a lot of conservatism um around you know what capabilities we should give to users maybe for good reason the technology is really new um a lot of us were new to working on it and you know if you're gonna have a bias you know biasing towards safety and being careful
It's not a bad DNA to have, but I think over time we learned that there's so many positive use cases that you... effectively prevent when you make arbitrary restrictions in the model. What about faces? Why not? Why can't I make any face I want? So this is a good example of a... you know capability that's got pros and cons and you can err on one side or the other but you know when we um first shipped um image uploads um into chat gpt
We had some debates about what capabilities do you allow versus where are you conservative? And I think one debate that we had is do we allow the upload of images with faces? Or rather, when you upload an image that contains a face. Do you, you know, should we just like gray out the face because you avoid so many problems, right? You can make inferences about people based on their face. You could say mean things to people based on their face.
And you would just take a giant shortcut on all the gnarly issues if you didn't allow that. But I've always felt we need to err on the side of freedom and we need to do the hard work. And I think in this case, there's so many valid ways if I want. feedback on makeup or on my haircut or anything like that i want to be able to talk to chat gpt about it that those are valuable and benign use cases and i would prefer to allow and then study you know where does that fall short where is that harmful
And then iterate from there versus taking a default stance on disallowed. And I think that's one of those ways in which our stance and posture has changed a bit over time in terms of where we start. Yeah, we're very good. I think imagining worst case scenarios. What if I use this, these faces to evaluate hires for a company or whatever, but also it's like, Hey, is this eczema? Like, you know, there's a lot of utility there. And honestly, I think there are certain.
demands of AI safety where worst case scenario thinking is very appropriate. So I think that is an important way of thinking about risk when it comes to certain forms of risks that are existential or even just very, very bad. You know, we have the preparedness framework. which helps us reason through some of those things. Can the AI let you make a bioweapon? It's good to think about the worst case there because it'd be really, really bad.
So you kind of have to have that way of thinking in the company and you have to have certain topics where you think about safety in that way. But you can't let that kind of thinking spill over onto other domains of safety where the stakes are lower because you end up, I think, making very, very...
conservative decisions that block out many valuable use cases. So I think being sort of principled about different types of safety on different time horizons and with different levels of stakes is very important for us. I think I want a blunt mode sometimes.
and just because like right now it actually roasts you well i mean like yeah because i'll ask the model like because with the the voice in speech out model be like do i sound tired and it's like well you know i don't really want to you know and i'll be like yeah you know just you're trying to get it to be honest
You know, I think there's many cultures that would prefer a blunter chat GPT. Yeah. Very much on the radar. Yeah. Just to piggyback off Nick's answer, I think it's the iterative deployment that gives us the confidence, right? to push towards user freedom and you know we've had many cycles of this we know what users can and can't do um and that gives us the confidence to launch with the restrictions that we do
One of the other capabilities, one of the other generative capabilities that's been very interesting has been code. And I remember early on, GPT-3, we saw that all of a sudden it gets sped out into our React components, and we saw that, oh, wow, there's some utility there. And then...
We went, we actually trained a model more specifically on code. And that led to, we had Codex. Then we had Code Interpreter. Now Codex is somehow back. And, you know, a new form, same name, but the capabilities keep increasing.
¶ Code, Codex, and the rise of agentic programming
And we've seen code work its way first into VS Code via Copilot, and then Cursor, and then Windsurf, which I use all the time now. What? How much pressure has there been in the code space? Because I'd say that if we ask people who made the top code model, we might get different answers. Yeah, and I think it reflects that when people talk about coding, they're talking about a lot of different things, right?
I think there's coding in a specific paradigm. Like if you pull up an IDE and you want to kind of get a completion on a function, it's very different from, you know, agentic style coding where, you know, you ask. you know, I want, I want this PR and, you know, and I think we've done a lot of focus. Could you unpack a little bit what you mean by agentic coding? Yeah. Yeah. So I think when you. You can draw a distinction between more kind of real-time response models. You can think of Chachibiki.
uh to first order as you ask a prompt and then you get a response fairly fairly quickly and a more agentic style model where you give it a fairly complicated task you let it work in the background and after some amount of time it comes back to you with what it thinks is something close to the best answer right and i think we see increasingly that the future will look like more of a async kind of uh
where you're asking it very difficult, hard things. And you're letting the model think and reason and come back to you with really the best version of what it can come back with. And we see the evolution of code in that way too. I think eventually we... do see a world where you'll kind of give a very high level description of what you want and the model will take time and um it'll come back to you and so i think uh our our first launch codex really um reflects that kind of paradigm where
we are giving it PRs, units of fairly heavy work that encapsulate a new feature or a big bug fix. And we want the model to spend a lot of time thinking about how to accomplish this thing. rather than kind of give you a fast response. And to get your question, coding is such a giant space. There's so many different angles at it.
kind of like talking about knowledge work or something incredibly broad uh which is why i don't think there's one winner i don't think there's one best thing i think there's um so many options and i think developers are the lucky ones because they have so many choices uh right now and i think that's fundamentally exciting
uh for us too but to mark's point i think this agentic paradigm has been particularly exciting for us one framing i often use when thinking about product here is i i want to build products so that have the properties such that you know the model gets 2x better
product it's 2x more useful and i think you know chat media has been a wonderful thing because for a long time i think that was true but i think as we look at you know smarter and smarter models i think there's some limit to people's desire to talk to like a phd student um versus you know that they might value other attributes about the model like its personality and you know what it can actually do in the real world but um experiences like codex i think they they
they create the right body such that we can drop in, you know, more smarter and smarter models. And it's going to be quite transformative because you get the interaction paradigm, right? Where people can specify this task, give them all the time. And then, then.
then get a result back. So I'm really excited where it's going to go. It's an early research preview, but just like with ChatGPT, we felt like it would be beneficial to get feedback as early as possible and excited where we're going to take it. I was using Sonnet a lot, which I love. I think Sonnet for coding is fantastic. But with 04 mini medium setting in windsurf, I found was great. I found once I started using that, I was really happy because.
One, the speed, everything else like that. And I think that, and I think there are very good reasons why people like other models and I don't want to get into comparison, but I found that for me, for the kinds of tasks I was using, this was the first time I was very happy you guys put that out there. Absolutely. Yeah. And, um, you know, we.
feel like there's still a lot of low-hanging fruit in code it is a big focus for us and i think we'll find in the near future you'll find many more good options for the right code model tailored for your use case yeah I find often if I just need a quick answer to like how to write something in Dart, what does it get a 4.1 and say, but yeah, something bigger. And I think that's going to be the harder part is because yeah, these evals are some ways saturated, but also.
everybody has their own criteria that we look at and that's going to be kind of a you know a question to sort of see you know how are we going to adapt to all that right yeah i mean specifically in code right i think there's more beyond
¶ Coding with taste
Did it get you the right answer with code? People care about the style of the code. They care about how verbose it was in the comments. It cares about how much proactive work did the model do for you. other functions. And so I think, you know, there's a lot to get right. And users often have very different preferences here. Yeah, it's funny. I used to, I used to, you know, people used to ask me, well, what domains are going to like, you know, be transformed by AI, you know.
fastest and i used to say you know it's code because like similar to math and other things it's very very verifiable and testable and i think those are the domains that are particularly great to do rl on and you know you're therefore going to see all this this awesome you know agentic stuff just suddenly work
I still think that's true. But the thing that surprised me about code is that there is still so much of an element of taste in terms of what makes good code. And there's a reason that people train to be a professional software engineer. It's not because their IQ gets better. but rather because they learn how to build software inside an organization. What does it mean to write good tests? What does it mean to write good documentation?
How do you respond when someone disagrees with your code? Those are all actual elements of being a real software engineer that we're going to have to teach these models to do. So I expect progress to be fast and I still think code has a ton of nice properties. that make it very ripe for the Gentic products. But I do think it's very interesting to the degree that, you know, the element of taste and style and real world software engineering matters. It's...
Interesting, too, because with ChachiPT and the other models. you're kind of dealing with having to bridge the divide between consumer and pro. I open up ChatGPT and I tell my friends like, oh yeah, because I'll plug it into whatever code model I'm working because I can actually connect it to there. And I think about... you know well that's a very different use case a lot of other people although i've shown people like how to go in and use
you know, uh, an IDE and actually have it just write documents for you and create folders and stuff, which people don't realize like, yeah, you can do that. You can have chat UPD actually control it and do that, which is cool. But then you think about like, okay, we've got a tab now for images. There's the codex tab.
So if I want to connect to GitHub and have it work through there and there's a Sora into there. So it's kind of interesting to see how all of these things are coalescing into there. How do you differentiate between. A consumer feature, a professional feature, and maybe like an enterprise feature.
Look, we build very general purpose technology and it's going to be used by a whole range of folks. And unlike many companies which have this kind of founding user type and then they use technology to solve that user's problems. We do start oftentimes with the technology, observe who finds value in it, and then iterate for them. Now, with Codex, our goal was very much to build for professional software engineers.
knowing though that there's sort of a splash zone where I think a lot of other people will find value in it and we'll try to make it accessible for those people as well. There are a lot of opportunities to target non-engineers. I'm personally really motivated to create a world where, you know, or help build a world where...
Anyone can make software. Codex is not that product, but you could imagine those products existing over time. But as a general principle, it's really hard to predict exactly who the target user is until... we made some of these general purpose technologies available because it gets back to the empiricism I was talking about. We just never exactly know where the value is going to lie. Yeah. And I think even to.
to dig deeper into that student like you know you could have a person who's mostly using chachi for coding right but five percent of the time you know they might just want to talk to the model or like five percent of the time they just want a really cool image right and so i think you know um they're are certainly archetypes of people who use the models. But in practice, we see that people want this exposure to different capabilities.
With Codex and watching the launch of that, it kind of struck me. There are some tools you see that there's a lot of excitement about because there's a lot of internal demand for that. How much are you using it internally? Are tools like that? More and more.
¶ Internal adoption of Codex
Okay. I've been really excited to see the internal adoption. It's everything from, you know, exactly what you'd expect, you know, people using codecs to offload their tests to, you know, we have a analyst. um workflow that will look at you know logging errors and automatically flag them and slack people about it um so there's all these these ways that i've actually heard some people are using as a to-do where like future tasks they're they're hoping to do they're starting to fire off codex tasks
So this is the perfect type of thing that I think you can talk about internally. I'm very excited about the leverage that engineers are going to get out of a tool like this. I think it's going to allow us to move faster. um uh with with the people we have and make each engineer that we hire um you know yeah even like 10 times more productive so so in some ways internal uh usage is is a very good predictor of where we want to take this yeah i mean we don't want to ship
something to other people that we don't find value in ourselves. And I think, you know, leading up to the launch. Laundry Buddy. Laundry Buddy. Laundry Buddy is an essential partner. Okay, sorry, sorry. I mean, yeah, I mean, we had some power users, though, that, you know, hundreds of PRs a day. that they were generating personally. So I think there are people internally finding a lot of utility from what we're building. Also, if you think about internal adoption, it's also good.
reality check because you know people are busy you know adopting new tools take some activation energy so actually um the thing you find when you try to dog food things internally is is
some of the reality component of how long it takes people to actually adjust to a new workflow and it's been it's been humbling to to watch right so so i think you learn both about the technology but you also learn about some of the adoption patterns when you're trying to get a bunch of busy people to change the way they write code
¶ Skills that matter: curiosity, agency, adaptability
As you build these tools, internally, people have to learn how to use them and are having to adapt. And there's a lot of question now about... What kind of skills do people need in the future? What kind of skills do you look for on your teams? I've thought about this a lot. Hiring is hard.
especially if you want to have a small team that is very, very good and humble and able to move fast, et cetera. And I think curiosity has been the number one thing that I've looked for. And it's actually my advice too.
students when they ask me, what do I do in this world where everything's changing? Because I mean, for us, there's so much that we don't know, there's a certain amount of humility you have to have about building on this technology. Because you don't know what's valuable, you don't know what's risky until you really study and go deep.
and try to understand. And when it comes to working with AI, which we obviously do a lot, not just in code, but in kind of every facet of our work, it's asking the right questions that is the bottleneck, not necessarily getting the answer.
So I really fundamentally believe that we need to hire people who are deeply curious about the world and what we do. I care a little bit less about their experience in AI. Mark presumably feels a bit different about that one. But for the product side, it's been curiosity.
found the most the best predictor of success no i mean even on research i think increasingly less uh we index on you have to have a phd in ai right i think uh this is a field that people can pick up fairly quickly i also came into the company as a resident without much formal AI training. And I think correlated to what Nick said, I think one important thing is for our new hires to have agency, right? OpenAI is a place where you're not going to get...
so much of a, oh, here's today, you're going to do thing one, thing two, thing three. It's really about being kind of driven to find, hey, here's the problem. No one else is fixing it. I'm just going to go dive in and fix it. Also adaptability, right? It's a very fast changing environment. That's just the nature of the field right now. And you need to be able to quickly figure out what's important and pivot what you need to do. The agency thing is real. I think we often get asked.
for, you know, how does it keep shipping and, you know, it feels like you're pushing something out every week or something like that. it's a funny because it never feels to me i always feel like you know we could go be going even faster um um but but you know i think fundamentally we just have a lot of people with agency who can ship um that comes to pride that comes to research that comes to policy
Shipping can mean different things. We all do very different things at OpenAI, but I think the ratio of people who can actually do things and the lack of red tape, except where it matters, the couple areas where I think red tape is very, very important.
I think that is what makes OpenAI very unique, and it obviously affects the type of people who we want to hire to. I was brought into the company because I was originally given access to GPT-3, and I just started showing all these use cases for it and making videos every week for it.
¶ OpenAI's "Do Things" culture
Yeah, and that was annoying people, I'm sure. No, it was not. It was really fascinating. It was exciting. It was an exciting time. I described it to people like... I think they built a UFO and I get to play with it. And then I make it hover and like, oh, you made it hover. I'm like, well, they built it. I just pressed the button and got to do that. But that was just-
What I found very empowering was the fact that I'm self-taught. I learned to code by Udemy courses and stuff. And then to be a member of the engineering staff and be told, just go do stuff.
Nothing too critical. I didn't break anything, anybody. And that's good to know that that kind of spirit is still there. And I think that is part of the reason why OpenAI is able to ship, even though... you know it was like 150 200 people worked on gpt4 i think people forget about that you know totally and honestly this is how and even chat gpt this is how how it came together you know we we had
research team they'd been working you know for a while on instruction following and then the successor to that and you know post training these models to be good at chat But the product effort came together as a hackathon. I remember distinctly. We said, like, who's excited to go build consumer products? And we had all these different people.
at a guy from the supercomputing team who, you know, was like, I'll make an iOS app. I've done that in the past life where we had a researcher who wrote some backend code in it, which is convergence of people who were excited to do stuff. And I think the ability to do so.
And I think that's how you get the next chat to be, is running an organization where that is possible and continues to be possible at the scale. Hackathons were my favorite thing, because one, being a performer and loving show and tell, but it was just neat to be able to see things that you knew. were going to be a product or something later on because when you're playing to the technology is this advanced and all that, do you guys still do them? Yeah, absolutely. Okay. Yeah.
We've had some fairly recently, and they are typically tied. Last week, actually. Can't say what it was about, but it was an exciting thing. And it's how you find out what's possible. Yeah. I'm excited to hear that. I do have a question, which is how much as it grows again, like when, when I started like 150 people on the company, now there's like 2000 and now.
You know, I see a video with Sam talking to Johnny Ive. And how much is that going to change the character, the spirit of bringing in all this? I think all the outside expertise has been great. We've seen this great sort of run of products. But do you see it changing the culture?
Well, I mean, I think probably in the right way, right? It's like, I think when we look at AI, we don't think of it as some fairly narrow thing. And we've always been kind of enthralled by just the potential and all the different things you could build with AI.
and yeah to nick's point right this is why we're able to ship so quickly because people imagine all these different possibilities they imagine the future with ai and they try to bring it about right and i think these are facets of that imagination right it's like What does AI look like if you imagined an AI-first device, for instance? Yeah, when you go from 200 to 2,000, you'd think a lot would change. And yeah, maybe in some ways it has.
I think people often underestimate the number of things that we're doing. I always feel like being at OpenAid feels much closer to being in a university where... You know, you've got this kind of common reason to being there, but everyone's doing something different and you'll sit down at dinner or at lunch and you'll talk to someone and learn about their thing. And you're like, wow, that's so cool that you're doing that.
And so it feels much smaller because I think of the sort of broad range of things we're doing and therefore each individual effort, whether or not that's something like ChatGPT or something like Sora or et cetera. is actually staffed in a very, very conservative and lean way that continues to keep people very autonomous and make sure they have resources, et cetera. So I think it's partly that that has made it feel very, very similar in the good ways to when I started here.
We talked a bit about one of the things you look for is curiosity, and Mark said that's helpful too. If I'm somebody outside of AI, okay, if I'm 25 or I'm 50, and I'm looking at the advancement of technology and- maybe have a little bit of fear because I see copywriting is one of the things that ChatGPD got great at. Writing code is great. I personally have the opinion that...
we'll never have enough people creating code because there's more things code can do in the world than we can imagine. And even the thing places the copy, my wife showed me the other day on her, um, her skin block, her sunblock lotion bottle, showed me on her sunblock lotion bottle some very funny copy about the ingredients. I said, oh, this is not a place I expected to see this.
But that's one of the tiny little places that all of a sudden that you can put more thought into it. That being said, I know that I'm a bit of an optimist because I see all these opportunities and places to go in there. What advice do you give people at whatever point they are in life about?
¶ Adapting to an AI future
preparing for or adapting to or being part of the future. I like how Mark just looked right to me. He said, you take this. I can go. Okay, I'm jumping right now. I think the important thing is you have to really lean into using the technology, right? And you have to see how...
your own capabilities can be enhanced, how you can be more productive, more effective by using the technology. I fundamentally do think that the way this is going to evolve is you will still have your human experts, but what AI helps the most is the people who...
don't have that capability at a very advanced level right so if you imagine right like uh as these models get much better at health care advice um they're gonna help people who don't have access to care the most right uh image generation right it's not
producing an alternative for experts or professional artists. It's allowing people like me and Nick to create creative expressions, right? And so I think it's kind of rising the tide that allows people to be... competent and effective at a lot of things all at once and i think that's kind of how we're going to see a lot of these tools bootstrap people the world's going to change a lot and i think truly everyone
has a moment where the EI does something that they considered sacred and human. I know a guy that got vested and felt very threatened about his achievements in code unabilities. Well, that happened for me a long time ago. Let's be talking about someone else in the room. Oh, yeah. I mean, yeah, it's definitely...
Better than me at a lot of code problem solving, for sure. Right. So I think it's deeply human to feel some level of awe, respect, and maybe even fear. And I think to Mark's point, actually using this thing. can demystify it. I think we all grew up or learned about the word AI in a world where AI meant something pretty different from what we have today. You've got these algorithms that try to sell you things, try to do things, or you've got movies where the AI takes over, et cetera.
And like that term means so many things to different people that I'm entirely unsurprised that, you know, there's fear. So actually using the thing is, I think, the best way to have a grounded conversation about it.
And then I think from there, the best way to prepare, I think there's some degree to which you need to understand the products and keep up, sure. But I think things like prompt engineering or sort of understanding the intricacies of this AI, they're kind of not the right direction. I think sort of...
it's fundamental human things like learning how to delegate that is incredibly important because increasingly, you know, you're going to have an intelligence in your pocket that it can be your tutor, it can be your advisor, it can be your software engineer.
It's much more about you understanding yourself and the problems you have and how someone else might help than a specific understanding of AI. So I think that's going to be important. Curiosity, I mentioned earlier, I think asking the right questions, you only get what you put in, right?
That's important. And I think fundamentally being ready to learn new things. I think the more you understand how to pick up new topics and domains, et cetera, the more you're going to be prepared for a world where the nature of work is shifting. much faster than has ever shifted before. So I'm prepared that my job and product is going to look different or not exist at all. But I am looking forward to picking up something new. And I think as long as you bring that perspective.
um you're well set up to leverage ai i think we we sometimes over index on you know sometimes certain jobs go away because like you know we don't really need a lot of you know typewriter repair people anymore, right? And then certain kinds of coding jobs are probably going to go away. But like I said, I think there's way more opportunity for coders or people to create code however it's done.
¶ The opportunities ahead: healthcare, research
And you mentioned like the health field. And that's one of the things I hear people like, oh, when, you know, when we replace everything with AI, like, well, I mean, I would be very happy having an AI diagnose me, operate on me and probably do everything else. But I do want somebody there to talk me through the procedure and hold my hand.
But also I want people asking questions like, you know, every day I take a bunch of vitamins. It's just the right time of day to take it. You know, I can't bother my doctor with all these silly little questions. I really don't think you end up displacing doctors. You'd end up displacing not going to the doctor.
You end up democratizing the ability to get a second opinion. Very few people have that resource or know to take advantage of a resource like that. You end up bringing medical care into pockets of the world where that is not readily available.
And you end up helping doctors gain confidence. I've often heard from doctors that they already talk to existing colleagues to get a second opinion. In some cases, that's not possible. And I think you'd be surprised by the number of doctors that use ChatGPT.
Now on things like medicine, there's work to make the model really, really good. And we're excited to do that work. There's also work to prove that the model is really good because I think you're not going to trust that until there's some degree of sort of legitimacy.
And then there's work to explain the areas where the model might not be good, because increasingly, once it gets to human and then super human level performances, it's hard to frame exactly where it will fall short, which is also hard to sort of reckon with.
But nonetheless, I think that opportunity is one of the things that gets me up in the morning. Education might be the other one. And I think there's a tremendous opportunity to help people. What do you think is going to surprise us the most in the next year to 18 months?
I honestly think it's going to be the amount of research results that are powered even in some small way by the models that we've built. And one of the kind of... quiet things that's taken the field by storm is the ability of the models to reason and you already see some research i'm going to make you explain yeah you say reason yeah so
I want you to reason through the question as you explain reason. Think out loud. Yeah, think out loud. Tell us your traces. Yeah, this really fits into this agentic paradigm that we were talking about earlier. The way that the models approach solving a problem that takes some time to solve is that it reasons through it, much like you or I might.
If I give you a very complicated puzzle. I think you reason probably much better than I do, Mark. I'm flattered. Yeah, like a complicated puzzle. You might think to yourself, for instance, let's just use a crossword puzzle. You might think through all the different alternatives and what's consistent. Is this row kind of consistent with that column? And you're searching through a lot of alternatives. You're backtracking a lot. You're trying a lot of your hypotheses.
And then at the end, you come up with a well-formed answer. And so the models are getting a lot better at that. And that's what's powering a lot of the advancements in math, in science, in coding. So this has reached a level where... today in many research papers people are using o3 almost as a subroutine right there's sub problems within the research problems they're trying to solve which are just fully automated and solved through plugging into a model like o3
I've seen this in several physics papers. Talk to physicists even where they're like, wow, like I had this expression that I couldn't simplify, but O3 made headway on it. And these are coming from some of the best physicists in the country. So I think you're going to see that happen more and more and more and more. And we're going to see just acceleration in progress in fields like physics and mathematics.
It's a hard one to beat because I would swap many things we do in exchange for making a true significant scientific advancement. But I think we can have multiple of these things. I think for me, it's the fact that any well-described problem that is intelligence constrained.
I think will be solved in products. And I think we're fundamentally just limited by our ability to do that. So what that means is like, you know, in companies in the enterprise, there are so many problems that are fundamentally hard that the models are not smart enough to do yet.
Whether or not it's software engineering, whether it's not running data analyses, whether or not it is providing amazing customer support. There's all these problems that the models fall short at today that are very, very...
easy to describe and evaluate. And I think we'll make tremendous progress at those. On the consumer side, these problems exist too. They're a bit harder to find just because consumers are... um um worse at um telling us exactly what they want that's the nature of building consumer products but i think it's very very worthwhile where you know there's many hard things we do in our personal life whether or not it's doing taxes whether or not it's planning a trip whether or not it's
searching for a high consideration purchase, whether or not that's a house or a car or a piece of clothes. All those things are problems where we need just a little bit more intelligence.
And the right form factor. So I think the other thing that's going to happen in the next year and a half is you'll see a different form factor in AI evolve. I think chat is still an incredibly useful interaction model, and I don't think it's going to go away. But increasingly, you're going to see more of these sort of asynchronous.
um workflows coding is just one example but for consumers it might be sending this thing off to go find you the perfect pair of shoes or to go you know plan a trip or for you know to go um finish your taxes And I think that's going to be exciting. And we're going to think of AI a little bit differently than just a chat bot. One of my favorite examples, both from a utility point of view capability and then UI was.
¶ Async workflows and the superassistant
Deep research. And deep research is probably the best example we maybe have of probably agentic sort of model use right now, because it used to be you would ask for a model to tell you about a topic. You would either get the data or just do a big search of the internet, and then it would just summarize all that. where deep research will go find some set of data, look at it, ask a question, then go find some new data and come back to it and keep going on. And I think the first time.
I used other people use it like, wow, this is taking a while. And then you added a UI change so I can actually go away and go do something else. And then the lock screen on my phone will show me this is working. which was a paradigm shift. And I talked to Sam here about that. And Sam said that was a surprise to him, was the fact that people would be willing to wait for answers.
And now I've seen a new metric for models as how long a model can spend trying to solve a problem, which is a good metric if it ultimately solves it. And that's, has this been an update to you and how you think about these things? The idea of like, oh, we don't just want, and I guess you talked about this before about agentic and the idea that it's not just give me the answer. It's like, take your time, get back to me.
i think you know to build a super assistant you got to relax constraints like today you have a product that is you know entirely synchronous you have to initiate everything um that's just not the maximally best way to help people like if you think about a real world
um intelligence that you might get to work with um it has to be able to go off and do things over a long period of time it has to be able to be proactive um so i think there's like we're sort of in this process of relaxing a lot of the constraints on the product and on the technology to better mimic
a very very helpful um entity um the ability to go do five minute tasks you know five hour tasks eventually five day tasks is like a very very fundamental thing that i think is going to unlock a different degree of value in the product So I've actually not been that surprised that people are willing to do that. Like, I don't really want to be sitting around waiting for my coworker either. And I think if the value is there.
um i'd gladly be doing other stuff and come back yeah and we really don't do it just because right we do it out of necessity the model needs that time to solve the really hard coding problem or the really hard math problem and
it's not going to do it with less time, right? You can think about this as, I give you some kind of brain teaser, right? Your quick answer is probably like the intuitive wrong one. And you need that actual time to kind of work through other cases to like, are there any gotchas here? And I think it's that kind of stuff that...
ultimately makes robust agents. We've seen kind of, there's like the paper of the moment where somebody comes out and says, ah, I found a blocker. And I remember there was one a month or so ago and they said models couldn't solve certain kinds of problems and it wasn't hard to figure out a prompt.
that you could train into a model that could solve those kinds of problems. And we had a new one that talked about how they would fail at certain kinds of problem solving ones. And that was kind of quickly, I think, debunked by showing that, you know, the paper kind of had flaws in there, but.
There are limitations. There are things that there might be some blockers and things we don't know are going to be there. I think brittleness is one of the things. There is a point where models can only spend so much time solving a problem. We're probably at a point where we're only having the model, you know. maybe two systems watch each other and we have to think about how a third system stops, you know, to wait for things to break down. But do you see kind of any...
blockers between here and where I'm getting the models that are going to be solving, doing things like coming up with interesting scientific discoveries? I mean, I think there are always technical innovations that we're trying to come. come up with right um fundamentally we're in the business of producing simple research ideas that scale and the mechanics of actually getting that to scale are are difficult right it's a lot of engineering a lot of research to kind of figure out um
how to kind of tweak past a certain roadblock. And I think those are always going to exist, right? Every layer of scale gives you new challenges and new opportunities. So, you know, fundamentally the approach is the same, but we're always encountering new small challenges that we have to overcome.
Just to build on that, I mean, the other business we're in isn't building great product with great products with these models. And I think we shouldn't underestimate the challenge and amount of discovery needed. to really bring these ever intelligent models into the right environment, whether or not that's giving them the right sort of action space and tools, whether or not that's really being proximate to the problems that are hardest, understanding those and bringing the AI there.
So I think there's the technical answer, but I think there's also the real-world deployment. And I think that always has challenges that are very, very hard to predict, yet worthwhile and part of our mission to do this all.
¶ Favorite ChatGPT tips
All right, last question and I'll begin. It's what's your favorite user tip for ChatGPT? Mine is I take a photograph of a menu and I'm like, help me plan a meal or whatever. I'm trying to like, you know, stick to a diet or whatever. See, I really want that use case, but like I've been trying for wine lists and that is my eval on multimodality. It still doesn't work. Really? It keeps embarrassing me with like hallucinated wine recommendations and I go order it and they're like.
Never heard of this. So I'm glad yours works. But for me, that's still a use case. Well, I mean, maybe the line lens is too dense. That was a problem. That was a problem. The operator was it like originally was the vision models that too much dense text. It just loses its placement.
Yeah, I mean, speaking to deep research, I love using deep research. And when I go meet someone new, when I'm going to talk to someone about AI, I just pre-flight topics. I think the model can do a really good job of... contextualizing who i am who i'm about to meet and what things we might find interesting um and i think it really just helps with that whole process very cool
I'm a voice believer. I don't think it's entirely mainstream yet because it's got many little kinks that all add up. But for me, half of the value of voice is actually just having someone to talk to and forcing yourself to articulate. um um yourself and i find that to sometimes be very difficult to do in writing so on my way to work i'll use it to process my own thoughts and like with some luck and i think this works most days i'll have
the restructured list of to-dos by the time I actually get there. So Voice for me, it continues to be the thing that I both love using and want to see improve over the next year.