OpenAI researcher on why soft skills are the future of work | Karina Nguyen (Research at OpenAI, ex-Anthropic) - podcast episode cover

OpenAI researcher on why soft skills are the future of work | Karina Nguyen (Research at OpenAI, ex-Anthropic)

Feb 09, 20251 hr 15 min
--:--
--:--
Listen in podcast apps:

Episode description

Karina Nguyen leads research at OpenAI, where she’s been pivotal in developing groundbreaking products like Canvas, Tasks, and the o1 language model. Before OpenAI, Karina was at Anthropic, where she led post-training and evaluation work for Claude 3 models, created a document upload feature with 100,000 context windows, and contributed to numerous other innovations. With experience as an engineer at the New York Times and as a designer at Dropbox and Square, Karina has a rare firsthand perspective on the cutting edge of AI and large language models. In our conversation, we discuss:

• How OpenAI builds product

• What people misunderstand about AI model training

• Differences between how OpenAI and Anthropic operate

• The role of synthetic data in model development

• How to build trust between users and AI models

• Why she moved from engineering to research

• Much more

Brought to you by:

Enterpret—Transform customer feedback into product growth

Vanta—Automate compliance. Simplify security

Loom—The easiest screen recorder you’ll ever use

Find the transcript at: https://www.lennysnewsletter.com/p/why-soft-skills-are-the-future-of-work-karina-nguyen

Where to find Karina Nguyen:

• X: https://x.com/karinanguyen_

• LinkedIn: https://www.linkedin.com/in/karinanguyen28

• Website: https://karinanguyen.com/

Where to find Lenny:

• Newsletter: https://www.lennysnewsletter.com

• X: https://twitter.com/lennysan

• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/

In this episode, we cover:

(00:00) Introduction to Karina Nguyen

(04:42) Challenges in model training

(08:21) Synthetic data and its importance

(12:38) Creating Canvas

(18:33) Day-to-day operations at OpenAI

(20:28) Writing evaluations

(23:22) Prototyping and product development

(26:57) Building Canvas and Tasks

(33:34) Understanding the job of a researcher

(35:36) The future of AI and its impact on work and education

(42:15) Soft skills in the age of AI

(47:50) AI’s role in creativity and strategy development

(53:34) Comparing Anthropic and OpenAI

(57:11) Innovations and future visions

(01:07:13) The potential of AI agents

(01:11:36) Final thoughts and career advice

Referenced:

• What’s in your stack: The state of tech tools in 2025: https://www.lennysnewsletter.com/p/whats-in-your-stack-the-state-of

• Anthropic: https://www.anthropic.com/

• OpenAI: https://openai.com/

• What is synthetic data—and how can it help you competitively?: https://mitsloan.mit.edu/ideas-made-to-matter/what-synthetic-data-and-how-can-it-help-you-competitively

• GPQA: https://datatunnel.io/glossary/gpqa/

• Canvas: https://openai.com/index/introducing-canvas/

• Barret Zoph on LinkedIn: https://www.linkedin.com/in/barret-zoph-65990543/

• Mira Murati on LinkedIn: https://www.linkedin.com/in/mira-murati-4b39a066/

• JSON Schema: https://json-schema.org/

• Anthropic—100K Context Windows: https://www.anthropic.com/news/100k-context-windows

• Claude 3 Haiku: https://www.anthropic.com/news/claude-3-haiku

• A.I. Chatbots Defeated Doctors at Diagnosing Illness: https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors-diagnosis.html

• Cursor: https://www.cursor.com/

• How AI will impact product management: https://www.lennysnewsletter.com/p/how-ai-will-impact-product-management

• Lee Byron on LinkedIn: https://www.linkedin.com/in/lee-byron/

• GraphQL: https://graphql.org/

• Claude in Slack: https://www.anthropic.com/claude-in-slack

• Sam Altman on X: https://x.com/sama

• Jakub Pachocki on LinkedIn: https://www.linkedin.com/in/jakub-pachocki/

• Lennybot: https://www.lennybot.com/

• ElevenLabs: https://elevenlabs.io/

Westworld on Prime Video: https://www.amazon.com/Westworld-Season-1/dp/B01N05UD06

• A conversation with OpenAI’s CPO Kevin Weil, Anthropic’s CPO Mike Krieger, and Sarah Guo: https://www.youtube.com/watch?v=IxkvVZua28k

• Tuple: https://tuple.app/

• How Shopify builds a high-intensity culture | Farhan Thawar (VP and Head of Eng): https://www.lennysnewsletter.com/p/how-shopify-builds-a-high-intensity-culture-farhan-thawar

Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.

Lenny may be an investor in the companies discussed.



Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe

Transcript

Not only are you working at the cutting edge of AI and LLMs, you're actually building the cutting edge. When I first came to Andarba and I was like, oh my God, I really love formal engineering. And then the reason why I switched to research is because I realized, oh my God. Cloud is getting better at front end. Cloud is getting better at coding. I think Cloud can develop new apps. What skills do you think will be most valuable?

going forward for product teams in particular. Creative thinking. And you kind of want to like generate a bunch of ideas and like filter through them in order to build the best product experience. I think it's actually really, really hard to teach the model how to be aesthetic.

really good visual design or like how to be extremely creative in the way they write. What do you think people most misunderstand about how models are created? When you taught the model some of the self-knowledge of you actually don't have a physical body to operate in the physical world. the model would get extremely confused. Today, my guest is Karina Nguyen. Karina is an AI researcher at OpenAI, where she helped build Canvas, tasks, the O1 chain of thought model, and more.

Prior to OpenAI, she was at Anthropic, where she led work on post-training and evaluation for the Cloud 3 models, built a document upload feature with 100k context windows, and so much more. She was also an engineer at New York Times, was a designer at Dropbox and at Square. it's very rare to get a glimpse into how someone working on the bleeding edge of AI and LLMs operates and how they think about where things are heading.

In our conversation, we talk about how teams at OpenAI operate and build products, what skills she thinks you should be building as AI gets smarter, how models are created, why synthetic data will allow models to keep getting smarter, and why she moved from engineering.

to research after realizing how good LLMs are going to be at coding. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. It's the best way to avoid missing future episodes and it helps the podcast tremendously. With that, I bring you Karina. Nguyen.

This episode is brought to you by Interpret. Interpret unifies all your customer interactions, from gone calls to Zendesk tickets to Twitter threads to App Store reviews, and makes it available for analysis. It's trusted by leading product orgs like Canva, Notion, Zoom, Linear, Monday.com and Strava to bring the voice of the customer into the product development process.

helping you build best-in-class products faster. What makes Interpret special is its ability to build and update customer-specific AI models that provide the most granular and accurate insights into your business Connect customer insights to revenue and operational data in your CRM or data warehouse to map the business impact of each customer need and prioritize confidently.

and empower your entire team to easily take action on use cases like win-loss analysis, critical bug detection, and identifying drivers of churn with Interpret's AI assistant wisdom. Looking to automate your feedback loops and prioritize your roadmap with confidence like Notion, Canva, and Linear? Visit enterpret.com to connect with the team and get two free months when you sign up for an annual plan.

This is a limited time offer. That's interpret.com slash Lenny. This episode is brought to you by Vanta. And I am very excited to have Christina Cassioppo, CEO and co-founder of Vanta, joining me for this very short conversation. Great to be here. Big fan of the podcast and the newsletter. Vanta is a longtime sponsor of the show. But for some of our newer listeners, what does Vanta do and who is it for?

Sure. So we started Vanta in 2018 focused on founders, helping them start to build out their security programs and get credit for all of that hard security work with compliance certifications like SOC 2 or ISO 2701. Today, we currently help over 9,000 companies, including some startup household names like Atlassian, Ramp, and Langchain, start and scale their security programs, and ultimately build trust.

by automating compliance, centralizing GRC, and accelerating security reviews. That is awesome. I know from experience that these things take a lot of time and a lot of resources. And nobody wants to spend time doing this. That is very much our experience, but before the company and some extent during it. But the idea is with automation, with AI, with software, we are helping customers build trust with prospects and customers in an efficient way.

And, you know, our joke, we started this compliance company so you don't have to. We appreciate you for doing that. And you have a special discount for listeners. They can get $1,000 off Vanta at vanta.com slash Lenny. That's V-A-N-T-A dot com slash Lenny for $1,000 off Vanta. Thanks for that, Christina. Thank you. Karina, thank you so much for being here. Welcome to the podcast. Thank you so much, Lenny, for inviting me. I'm very excited to have you here because...

Not only are you working at the cutting edge of AI and LLMs, you're actually building the cutting edge of AI and LLMs. You recently launched this feature, which is basically the first agent feature of OpenAI. I also just did this survey. I don't know if you know about this. I did a survey of my readers and asked them what tools to use every day in your work and most use.

And ChatGPT was number one above Gmail, above Slack, above anything else. 90% of people said they use ChatGPT regularly. It's absurd. And it wasn't around two years ago. Yeah. Also, we're recording this the week that OpenAI announced Stargate, which is this half trillion dollar investment in AI infrastructure. So there's just like a lot happening constantly in AI. And you have a really unique glimpse into...

how things are working, where things are going, how work gets done. So I have a lot of questions for you. I want to talk about how you operate and how you work at OpenAI, where you think things are going, what skills are going to matter more and less in the future. and also just where things are going broadly. So how does that sound? NAOMI AL- Sounds great. Thank you so much. Yeah, I was extremely lucky to join early days on Topic and kind of learned a lot of things.

uh there and and i joined open ai around like eight months ago so yeah i'm excited today it's more okay i'm gonna definitely ask you about the differences between those but i want to start more technical and just dive right in i want to talk about model training

People always hear about models being trained, these big models, how much data takes, how long it takes, how much money it takes, how we're running out of data, which I want to talk about. Let me just ask you this question. What do you think people...

most misunderstand about how models are created? Model training is more an art than a science. And in a lot of ways, like we as model trainers think a lot about like... data quality is like it's one of the most important things in model training is like uh how do you ensure the highest quality data for certain like interaction

model behavior that you want to create. But the way you debug models is actually very similar the way you debug software. So one of the things that I've learned early days at Anthropic was, like, we've discovered, especially with, like, clot-3 training, when you taught the model some of the self-knowledge of, like, hey, like, you actually don't have a physical body to operate, like, in the physical world.

But then at the same time, we had data that kind of taught the model some of the function calls, which is like, this is how you set the alarm. And so the model would get like extremely confused about like...

Whether... it can set an alarm but it doesn't have a body in the physical world so it's like the model gets confused and sometimes it's like over refused so sometimes it's like i don't know like uh sorry i cannot help you and so there is always like a balance trade off between how do you make the model to be more helpful for users, but also not being harmful in other scenarios. So it's always about how do you make the model more robust and operate across a variety of diverse scenarios.

That is so funny. I never thought about that. Most of the data that's trained on is kind of like assuming it's like a human describing the world and how they operate. And there's assumes there's a body and you could do things in the model told you don't have a body. Yeah. OK.

I want to talk a little bit about data while we're on this topic. I know you have strong opinions here. There's kind of this meme that models are going to stop getting smarter because they're running out of data. They're trained in...

large part on the internet, and there's only one internet, and they've already been trained on it. What more can you show them about the world? And there's this trend of synthetic data, this term synthetic data. What is synthetic data? Why do you think it's important? Do you think it's going to work? I think there are two questions here. We can unpack one at a time. But people say we're hitting the data wall. I think people think more in the terms of, like,

pre-trained large models that are trained on the entire internet to predict the next token. But what actually the model is learning during that process is actually how do you compress? the compression algorithm here the model learns to compress a lot of knowledge and it learns how to model the world um so the next prediction of the world like teach me how to drive basically, and you only have a few words that will match that, a car. So the model actually learns about the world in itself.

So it's like it's modeling human behavior. Sometimes it's modeling. And when you talk to like pre-trained models, which are very, very large, they're actually extremely diverse and extremely creative because you can. talk to almost any Reddit user through Puginemodel. But I think what's happening right now with like new paradigm of like L1 series is of like the scaling in post-chaining itself.

is not hitting the wall. And that's because basically we went from raw data sets from featuring models to infinite amount of tasks. that you can teach the model in the post-training world via reinforcement learning. So any task, for example, like how to search the web, how to use the computer, how to write, well, like...

all sorts of tasks that you're trying to teach the model all the different skills. And that's why we're saying there's no data wall or whatever, because there will be infinite amount of tasks. And that's how the model becomes extremely super diligent.

and we are actually getting saturated in all benchmarks. So I think the bottleneck is actually in evaluations, that we don't have all the frontier, like evals, like... i don't know um gpqa which is like a google proof question answering like phd level

intelligence patchwork is like getting to like i don't know more than like 60 70 percent which is what hd gets uh so it's like they're literally hitting the wall in like evals i want to follow both those threads so the first is on this idea of synthetic data

is a simple way to understand it that the models are generating the data that future models are trained on and you ask it to generate all these ways of doing stuff all these tasks as you described and then the newer models trained on this data that the previous model generated.

some tasks are synthetically curated so this is like an active like research area is like how do you can you can synthetically construct like new tasks model to like learn sometimes you know like when you develop products, you get a lot of data from the product and user feedback, and you can use that data too in this post-chaining world.

sometimes you still want to like use like human um human data because uh actually some of the tasks can be like really really hard uh to teach um like like like experts like only know like certain knowledge about like some chemicals like biological knowledge so like you actually need to tap into uh the expert uh knowledge a lot so yeah i think

to me like synthetic data training is more um for like product it's like a rapid model iteration for similar product outcomes and we can dive more into, but the way we made Canvas and tasks and new product features for HTTP was mostly done by synthetic training.

Let's actually get into that. That's really interesting. I want to talk about evals, but let's follow that thread. So talk about how this helped you create Canvas. So when I first came to OpenAI, I really had this idea of like... okay like it would be really cool for chattypt to actually like change the visual interface but also change like the way it is with people so Going from being a chatbot to more of a collaborative agent and a collaborator is like a step towards more genetic systems.

um that become like innovators ultimately and so the the entire team of like applied engineers designers products like research kind of like got like, formed in the air almost out of, like, nothing. It's just, like, a collection of people who just, like, got together and they rapidly started integrating each other. Actually, like, Kevus is, like, one of the...

I would say like the first project at OpenAI where researchers and applied engineers started working together from the very beginning of the product development cycle. And I think like there's... a lot of things that we have learned on the way but um i definitely came to with the mindset of like we need to do like a really rapid model iteration such that like it would be much easier for engineers

to work with the latest model possible, but also learn from user feedback or early internal dog food, how do we improve the model very rapidly. you know it's really hard to like um kind of like figure out like how people when you deploy a product how people would be able to like use it and so like The way you synthetically train the model is basically figuring out what are the most core behaviors that you want with this product feature.

to do and for canvas for example uh it was it came down to like three main It was how do you trigger Canvas for prompts like write me a long essay when the user intention is mostly like iterating over long documents or write me a piece of code. Or when to not trigger Canvas for prompts like, can you tell me more about president?

I don't know, some of the general questions. So you don't want to let trigger canvas because the user intention is mostly getting answered and not necessarily like iterates always a long document. The second behavior is how do we teach the model to update the document when the user asks? So one of the behaviors that we taught the model is actually have like the some agency and autonomy to literally go to the document and like select specific sections and

either deleted or added. So highlighted and rewrite certain sections. So sometimes the model... Sometimes the user would just say, change the second paragraph to be something friendlier. And you would have to teach the model to literally find the second paragraph in the document and change it to a friendly tone. So basically, you teach... both how to trigger edit itself, but also how do you teach the model to get higher quality edit for the document.

In case of coding, for example, there's also the question of how good the model is. completely rewriting the document versus having very specific targeted edits. So that's another layer of decision boundary within edit itself, is select. the entire document and rewrite completely or you want to have a very popular custom keyword. When we first launched the model, we would bias the model towards more rewrites because we saw the quality of the rewrites.

were much higher. But over time, you're shifting based on user feedback and what you're learning from iterative deployment. Lastly, the third behavior that we taught genetically, the model is how to make comments on any document. So the way we use that is like we would use a one model.

to produce, to, like, simulate, like, use a conversation. Let's say, like, write me a document about XYZ. But then we used O1 to, like, produce the document. And then we kind of injected, like, user prompt to be, like... oh make some comments critique my piece of writing uh or critique this piece of writing that you just made um and then we taught the model to like make comments on the document

on like very specific targeted documents. This is like, also like, what kind of comments you want the model to make? Like, do they make sense or not? Like, how do you teach the quality of that? And. It all came down to like measuring progress via very robust evals. But yeah, this is how you would use like a one like kind of synthetic data generation for like the screening. Okay, this is so interesting.

So you talk about this idea of teaching the model and you mentioned how it's using synthetic data to teach the model different behaviors.

is a simple way to think about it. Basically, that's where you do that by showing it what success looks like using basically evals. Is that the simple way to think about it? Like, here's what... you doing this successfully would look like and that teaches it okay i see this is what i should yeah great yeah amazing yeah you got it okay got it i want to start unpacking what your day-to-day looks like as you're building these sort of things is it like you sitting there uh talking to

some version of ChatGPT crafting these evals? Sometimes I do that. Sometimes I do sit with ChatGPT. Actually, I think I learned this so much from Android. It's like... people spend so much time just like prompting models and like qualitatively bad bash all the time and you actually get a lot of new ideas uh how do you make the model

uh better it's like oh like this is this response is kind of weird like why is it doing this and you start like debugging or something or like you start like figuring out like new methods like how do you teach model to respond in a different way like have better personality let's say so it's the same thing of like how personality is made like in the models windows

It's like very similar methods. But yes, I think my time over there have changed. I think when I first came, I was like mostly like research IC work. So I was like building a lot of like...

I was writing code, chaining models, writing evals, working with PMs and designers to learn, teach them how to even think about evaluations. I think that was like... really cool experience and i think this is like an adoption of like how do we like do this like prior management of like ai features or like um ai models um Yeah, but now it's like mostly like, you know, like management and like mentorship. I'm still like doing, I see like research code.

After like 4 p.m. although. But yeah, it's kind of like changed. All right. Don't talk too much about being a manager because everyone's firing their managers. Who needs managers anymore? That's what I hear now. Just kidding. It's interesting that so much of your time was spent on teaching product teams how evals integrate and how important that is. And I've heard this a few times and I haven't personally experienced it yet. So I think it's an important thread to follow is just how writing.

these evaluations is going to become increasingly an important part of the job of product teams, especially when they're building AI features and working with elements. So can you just talk a bit more about what that looks like? Is it like sitting there with an Excel spreadsheet? basically showing like here's the input here's the output here's how good

result was. Talk about what that actually looks like very practically. It certainly depends on what you're developing, but there are various types of evaluations. So sometimes I do ask product managers or there's also like new role that we have like model designers to kind of like go through some of the user feedback maybe or like think of like various like user conversations

that should have triggered like under this circumstances it should trigger canvas and then you have this like ground truth label of like okay with this conversation it should look to go countless under this conversation it should not trigger canvas and you have this like very minor, the German is that kind of like evolved.

that for, like, decision-related behaviors is like this. When we were launching tasks, for example, like, how do you make correct schedules is, like, absolutely really hard for the model.

but we built out like some of the deterministic evaluations that is like okay like if the user says like 7 p.m it's like the model should say 7 p.m so you can like have a deterministic evolves whether it's like pass or fail um so yeah and like the way it works is like sometimes i ask product managers just like go create like a goal sheet like have different tabs and like um

What's the current behavior? What's the ideal behavior? And why? Or some nodes. And sometimes we usually use it for eval. Sometimes we use it for chaining. Because if you give the spreadsheet to a one model...

it can probably figure out how to teach itself a good behavior. And I think there are second type of evals that is... kind of more prevalent is like human evaluations and you can have specific trainers or you can have like internal people to when you have like a conversation of the prompt and then you have like various completion of models you kind of choose the win rate which model is the best which model produced the highest quality comment or

edit, and then you can have continuous win rates. And as you develop new models, it should always win over the previous models. So it depends on what you want to measure. So interesting. Like basically what I'm hearing and there's something I'm learning about as I talk to people is product development might move from this like, here's a spec PRD, let's build it together.

And then, cool, let's review it. Are we happy with this? From that to, hey, AI, build this thing for me. And here's what correct looks like. And I'm spending all my time on what does correct look like on evals, essentially. You definitely want to like...

measure progress of the model. And this is where evals is because you can have prompted model as a baseline already. And the most robust evals is the one where prompted baselines get the lowest score or something and then because then you know like if you're trained a good model then it should like just like hill climb on that eval all the time while not like also like regressing on like other intelligence evals so it's like i think it's more what

that's that's what i'm saying like it's more than r than science it's like okay like if you optimize the model for this behavior like you kind of don't want to like brain damage in like other areas of intelligence or and this is happening like all the time in every lab and every like research team um i would say like prompting is like also a way to like prototype like new product ideas

Like, early days at Andara Glenay was working on, like, file uploads feature. I remember I was just, like, you know, prompting the model to just, like, I mean, when we were, like, launching, like, 100 key contexts. I was just prototyping this in a local person. I did the demo. People really, really loved it. And they just wanted API for file uploads or something. And then that's when it clicked to me.

I also wrote a blog post on February. It clicked on me that prompting is a new way of product development or prototyping for designers. and for, like, prime measures. For example, one of the features that I wanted to do is, like, have, like, personalized starter prompts. So whenever you come to, like, Cloud, like, it should, like...

recommend you starter prompts based on what your interests are. And so you can literally do it prompting for that. To experiment with that. Another feature was like... generating titles for the conversations. It's a very small, like, migrate experience, but I'm really proud of. the way we did that was because we took like five latest conversation from the user and like ask the model like what's the style of the user and then like for the next kind of new conversation the generated title will be

of the same like style. It's just like really little like micro experiences like this. That's so cool. Did you do that at Anthropic or at OpenAI? At Anthropic. Okay, cool. I love the file upload feature that Claude has, by the way. Oh, ChatGPT doesn't have that yet. Is that right? I think it has. I think like the way it's implemented is like very different though. Okay. Maybe it's the PDF feature because I use it all the time with Claude. Okay. That's cool. So he needs to get on that.

Man, it's wild how many features you built that I use every day and that many people use every day. This prototyping point you made is really important. It's something that comes up a ton on this podcast also of how that is maybe the way that AI has most impacted the job of product builders.

recently is just prototyping instead of going from showing just like, here's a PRD, here's a design. PMs more and more, just here's the prototype of the idea that I have and it's working. You can play with it. Yeah. Yeah. Okay. I want to spend a little more time on how you operate. So you talked about you built this in launch of this tasks feature. Is that the way you describe your tasks? Yeah.

So talk about how that emerged and let's better understand just how you collaborate with product teams and how OpenAI works in that way, whatever you can share there. I think Canvas and Tasks are going into the bucket of... where it's like more like short or like medium terms and um actually the way canvas and tasks came about to be was like it started was like one person prototyping

And creating like a spec. It's kind of like PRD. It's like creating a spec of like the behavior of the model. I don't think like tasks is like extremely like...

ground-breaking feature necessarily. What makes it really cool is... because the models are so general model can now search they can like write sci-fi stories they can like search for stocks they can like summarize the news every day because the models are so general like giving something familiar to people that like you know notifications is like very familiar like having reminders is like very familiar so like creating like a form factor for the people who like

very familiar. Same with Canvas, right? Google Docs is very familiar. But then you add a magical AI moment and it becomes very powerful. But the way it comes, usually, like, operationally, like, yeah, it's like a prototype, like, literally prompted. prototype of how you would want the model to behave. For tasks, for example, you kind of need to design. A little bit of design thinking is like, okay, well...

If the user says, remind me to go to lunch at 8 a.m. tomorrow, what kind of information does a model need to extract from that prompt in order to create a reminder? And so this is how you... like design like a spec for a new feature like a tool canvas and tasks are all tools. So it's like, how do you create the tool stack? And then it's mostly developing JSON schema. I was like, OK, from this problem, maybe the model should extract

the time that the user requested. And then you're thinking about, like, which format you want the time to be. And then, like, how do you want the model to, like, notify you is, like, basically... if the user should give instruction to the model and then this instruction would like fire off like every day or something at that particular time. So for example, if you say like search, like every day I want to like learn.

know about the latest AI news, the model should be right into like... okay like search for the latest ai news and this will this task will get fired at that particular type that the model that the user requested and then you know like your design is like tool spec and then actually i don't know like i feel like sometimes like it's like through conversations I like either like people ask me to like join the listening like

team and they're like, oh my god, we need to be researchers. We need some support. We need to train the models. Or sometimes, with Canvas, it's mostly like... i just pitched the idea of like it got stuffed quite immediately during the break um so i know like it's like depending on the project and usually with staffing it's like mostly like a product manager model designer

Actual product designer, a couple of researchers, applied engineers. Depends on the complexity of the project. And then, like, you know, for tasks, it took, like, I don't know, like... two months or so to go from zero to one, basically. Oh, wow. For Canvas, this was like four or five months, I guess, to go from zero to one.

but uh yeah and then like you know you teach product managers how to like build evals and like maybe you know how do we not only like ship uh the better feature but how do we think like logo term like what kind of like cool features did you want tasks to have like i think it would be nice for tasks to be like extremely a little bit more personalized it'd be nice to have like

to create tasks via voice on a mobile, right? Like, so you kind of need to like, this is how you get like research roadmap right here is like thinking like how the feature will be developed in the future. And then from there, it's like... you like start creating data sets like uh with evals you want to make sure that goes well and then like you

need to have a trade-off between what methods you want to use. And the reason why I really love relying purely on synthetic data instead of collecting data from humans is because it's much more scalable. It's cheap, less than how you literally sample from the model, and you teach the core behaviors of the models, and that will generalize to all sorts of...

diverse coverage. And when you launch the beta feature, you learn so much from the users that all your synthetic sets can be shifted in the distribution of how the users behave. and this is how you improve. And this is what happened in Canvas too when we launched from beta to GA. This episode is brought to you by Loom.

Loom lets you record your screen, your camera, and your voice to share video messages easily. Record a Loom and send it out with just a link to gather feedback, add context, or share an update. So, now you can delete that novel-length email that you were writing. Instead, you can record your screen and share your message faster. Loom can help you have fewer meetings and make the meetings that you do have much more productive.

Meetings start with everyone on the same page and end early. Problem solved. Time saved. We know that everyone isn't a one-take wonder when it comes to recording videos, so Loom comes with easy editing and AI features to help you record once and get back to the work that counts. Save time, align your team, stay connected, and get more done. with Loom. Now part of Atlassian, the makers of Jira. Try Loom for free today at loom.com slash lenny. That's L-O-O-M dot com slash lenny.

Something that I want to help people understand, and I don't even 100% understand this, is what's the simplest way to understand the job of a researcher versus, say, a model designer and other folks involved? Like, what's the simplest way to understand what researchers do at OpenAI? So the project that I described in mostly product-oriented research is mostly product research. Another part component of my team is actually more like longer-term exploratory projects. And it's more about like...

developing new methods, understanding those methods, and a variety of circumstances. So, like, basically develop new methods. You kind of, like, need to follow... very similar kind of like recipe of like building evals but it's like more sophisticated evals like you kind of want to have like outer distribution or like if you want to like measure generalization um you kind of need to like capture that but it's basically more science in a way where you know it's it's the

Talk about synthetic data. One of the hardest things about synthetic data is how do you make it more diverse? Diversity in synthetic data is one of the most important questions right now.

exploring ways to inject diversity as a general method that will work for all is one of the research explorations. Other ones is more like developing new capabilities. I feel like it's all about, you know, like... work on this new method and you have signs of life that it's working either you think of how do you make it more general or you think of how do you

make it very useful or like and this is how like longer-term projects become more like medium like short-term projects that makes sense essentially working on developing ways to make the models smarter 04 or 506 yeah any ways to uh

Like 01 was a big breakthrough, right? The way it operates where it's not just here's your answer. It actually thinks and has. Right. Takes time to think through the process of coming up with an answer. OK. Yeah. Very helpful. Speaking of that, of thinking about the future, where things are going.

I want to spend some time on just this insight that basically you are building the cutting edge of AI, like at the very bleeding edge of where AI is going and where it is. And so I'm very curious to hear just your take on...

how you think things are going to change in the world and how people work based on where you see things are going. And I know it's a broad question, but let's say like in the next three years, how do you see the world changing? How do you see people's way of working changing? It's a very humbling experience to be in both labs, I guess. Like, to me, when I first came to Undarvon, I was like, oh, no, I really love form and engineering. And then, like, the reason why I switched to, like, research.

is because i realized at that time it's like oh my god like cloud is getting better like frontends like cloud is getting better like coding i think hot can like develop new apps or something and so like it can like develop new features for the thing that I'm working so it's like it was kind of like this meta realization where it's like oh my god like

the world is actually changing. And they're like, when we first launched 100K Context at that time, obviously, you know, I'm thinking about like, form factors that's like, yeah, like, file uploads were very natural, very familiar to people. But you can imagine we could just make infinite chats in the

cloud.ai up, right? Like as if like it's like in a 100k context. But because like file uploads, it's like form follows function, it's like the form factor of the file uploads kind of enable people to just like literally upload anything the books are like any reports financial and like ask any task to the model and then i remember it was like you know enterprise customers like

like financial customers are like really interested in that as like oh wow like actually they it's actually one of the very common tasks that people do in that setting it was like kind of crazy to like see uh how some of the redundant tasks are getting like automated basically by these like smart models and they're entering the the era where

I actually don't know, for example, sometimes if L1 gives me the correct answer or not because I'm not an expert in that field. And it's like, I don't even know how to verify the outputs.

other models is because like only experts not unlike they can like verify this so yes so basically there are trends that are going on the first trend is the cost of reasoning and intelligence is drastically going down i had a blog post about this maybe i should update them like latest benchmarks because at that time like mmo everybody was like doing like um

so like one benchmark and then we like quickly saturated the benchmarks and like now we need to like do the same plot but with another like frontier eval But the cost of intelligence is going down because it becomes much cheaper. Small models are becoming even smarter than large models. And that's because of the... distillation, research. This happened with a lottery haiku.

I was, like, working on, like, positioning on, like, Cloud 2 Haiku, and I realized it was much smarter than, like, Cloud 2, which was, like, way, you know, bigger, or something like that. But, like, the power of, like, small models become very intelligent. and fast and cheap, we are moving towards that world. That has multiple implications, but that means that like... People will have more access to AI, and that's really good. Builders and developers will have much better access to AI.

But also it means all the work that has been bottlenecked by the intelligence will be kind of unblocked. So anyone... I'm thinking about healthcare, right? If I have... Instead of going to a doctor, I can... ask chachi pt or give chachi pt a list of symptoms and ask me like uh which like would i have like a cold flu or like something else like i can literally get the access to like uh

doctor almost and there's like been some like research studies around that yeah there's a new york times story about that where they compared doctors to doctors using chat gpt to just chat gpt and just just chat gpt was the best of them all like doctors made it worse yeah yeah that's crazy like right like education i think uh i would have friends if

like i had the tool i touched between when i was like young and like i would learn so much but it's like people can now learn almost anything from these models so they can learn new language they can learn how to build new look up like all right anything that you want and like i'm so like it's humbling to like have like launch canvas and like bring that thing to the people enable them to do something else that they couldn't have

ever before, and I think there's something magical around this experience. So education will have massive implications. I guess scientific research, right? I think it's the theme of any AI research is like augmented AI research.

It's kind of scary, I'd say. Which makes me think that, like, people management will stay, you know? It's like... one of the hardest things to it's like emotional intelligence for the models or like creative creativity in itself is like one of the hardest things so writers i don't think like

people should be worried as much. I think it's like, I think it's alleviate a lot of like redundant tasks for people. This is awesome. Okay. I want to follow this thread for sure. And it's funny that what you described is like, you were an engineer at Anthropic and you're like,

okay, Claude is going to be very good at engineering. This isn't going to be a potentially career long-term, so I'm going to move into research. And AI is going to need me for a long time to build it, to make it smarter. I would say we still have like... I think Canvas team has still have like a really cool like front engineers that I really like, you know, people who like really care about like

interaction design, interacting experience. I don't think models are there yet. But we can get the models to this top 1% of frontend or something, for sure. So what I want to move on to next along these lines is just, and this is just speculation, but what skills do you think will be most valuable going forward for product teams in particular? So folks are listening and they're like, okay, this is...

scary? What should I be building now to help me stay ahead and not be in trouble down the road? What skills do you think are going to be most more and more important to build? Yeah, I think like creative. thinking like you kind of want to like um come up like generate a bunch of ideas and like filter through them and not just like build the best product experience listening you know you want to like build something that like

the most general model will not replace you. And oftentimes you build something and you make it really, really good for like specific set of users and actually the mode. is now in like your user feedback the mode is like more in like you whether you listen to them like whether you you can like rapidly iterate like the mode is like

in here, I don't think like we are yet to like, there are so many ideas. I think there's an abundance of like ideas that you're gonna work on. It's like, I wouldn't be worried. I feel like, in fact, I do think like people in AI fields are like. I wish they were a little bit more creative and connecting dots across different fields or something like that to develop really cool new generation.

a new paradigm of interactions with this AI. I don't think we've cracked this problem at all. A couple of years ago, I was telling some people, I was like, you know, you kind of want to build for the future. So it's like, it doesn't... necessarily matter whether the model is good or not good right now, but you can build product ideas.

such that like by the time the models will be really good it will work really well um and i think it just like happened naturally like for example like i don't know but like right like uh the clod artifacts and i feel like early days of canvas was like back in like 2022 like before chachi pt like writing id was like i don't know but i feel like clod

1.3 model itself was not there to make really extreme, good, high-quality edits, for example, like coding. And I feel like startups like Cursor and it's like... doing super well like and that's because they like iterate so fast they like invent like new ways or like training models they move really fast they listen to like users like massive distribution it's like

Yeah, it's kind of cool. That's really helpful, actually. So what I'm hearing is that soft skills essentially are going to be more and more important and powerful. You talked about management, leading people, being creative and coming up with innovative insights, listening.

There's a post I wrote that I'll link to where I try to analyze how AI will impact product management. And we're actually very aligned. And my sense was the same thing, that soft skills are going to become more and more important.

And the things that are going to be replaced is the hard skills, which is interesting because usually people value the hard skills like coding, design, writing really well. And it's interesting that AI is actually really good at that because it's taking a bunch of data, synthesizing it.

and writing, creating a thing versus all these fuzzy things around of what influences, convinces people to do things and aligning and listening, like you said, creativity, anything along those lines come up as I say that. I think it's actually really, really hard to teach the model how to be aesthetic or, like, do, like, visual, really good, like, visual design or, like, how to be extremely creative in the way they write. I think, like...

I still think that Chichi B kind of sucks at writing. And that's because it's bottled mouth by this creative reasoning. I think prioritization is one of the most important. I think for a manager, I feel like...

Actually, AI research progress is bottlenecked by research management. It's because you have a constrained set of compute, and you need to allocate the compute to the research path that you feel the most convinced about it was like you need to like really you need to have like a really high conviction in the research path to put the compute and like it's more like return on investment um kind of situation as like okay yeah like i'm thinking a lot about like

okay like how to across all my projects which projects are higher priorities like prioritization and also like on the lower level it's like which experiments are really important to run right now and which are not and like cut through the line so it's gonna be like Prioritization, communication, management, people skills, empathy, understanding people, collaboration. I think Canvas wouldn't be an amazing launch.

If it wasn't, like, about, like, people. And I think it's a wonderful group of people. And, like... i get a chance to like work with like people like lee byron who's like a co-creator like graphql and like some of the best like apple designers and it's like so cool to like see and like how do you create this like collaboration between people it's just like

something that's still humane i think let me just follow this around a little bit because i imagine people listening are like okay but once we have agi or sgi it's like it'll do all this it's you know it's like there's a world where like why isn't all this done I think it's easy to just assume all that. I'm curious, this idea of creativity and listening, why you think AI isn't good at it, other than it's just very hard to train it.

to do this well is there anything there just like why this is especially difficult for ai and lms to get good at i think currently it's difficult for many reasons i think it's still like an active like research area and it's something that like i think my team is like working on is like okay like how do we teach the model to be like more creative and like the writing and actually like i'm thinking like

this new paradigm of like the models think more should actually lead to like better writing in itself but like when it comes down to like idea generation or like um discriminating of like what is the good like visual design and odd i feel like it hasn't had learned like examples from like people to discriminate it very well i do think it's because like

You know, there are not that many people who are, like, actually, like, really, like, it's not, like, accessible to, like, models to learn from these people, I guess. So definitely that's why it sucks. Yeah, that makes sense. Basically, there's not enough of you yet. Researchers teaching it to do these things slash people that have incredible taste and creativity that can teach these things. You could argue.

this will come, but I'm not, we don't need to keep going down that thread. Let me ask you a specific question. In this post I wrote, I made this argument that a lot of people disagreed with that strategy is something that AI tooling will become increasingly great at and take over there's the sense that that's the thing that people will continue to be much better at and you can't offload basically developing your strategy telling you what to do to win my case is

Isn't strategy to just take all the inputs, all the data you have available, understand the world around you and come up with a plan to win? Feels like AI would be, like an LLM would be incredibly smart at this. What's your take? I think so too. I think like...

Again, like, you teach the model all sorts of, like, tools and, like, capabilities and, like, reasoning, right? And it's, like, when it comes down to, like, as for Canvas right now, it would be very cool to, like, for the model to just, like... aggregate all the feedback from users like summarize me like the top five like most painful flows on user experiences and then like the model itself is like very capable of like

like thinking of like knowing how it's being made figure out like how to like create a data set for itself to like train on it and I don't think like we are far away from that kind of like self-improvement models becoming like self-improved via like then like the product development is basically kind of like self-improving like it's kind of like its own like organism or something um

Yeah, like, again, like, strategy is, like, it's more, like, data analysis and, like, coming up with, like, like, I think what models are really good at is, like, We're connecting the dots, I think. It's like, okay, if you have user feedback from this source, but you also have an internal dashboard with metrics, and then you have...

you know, like other kind of like feedback or like inputs. And then like it can co-create like a plan for you, like recommendations even. And I think this is like one of the most common use cases for trashypt2 is like...

coming up with these sorts of things. That makes sense. Essentially, a human can only comprehend so much information at once and look at so much data at once to synthesize takeaways. And as you said, these context windows are huge now. Here's all the information. What's the most important thing I should do?

Yeah, same as scientific research. Ideally, the model would be able to suggest ideas, like new ideas, or iterate on the experiment, or like... given the empirical results of the previous experiments like how do you like come up with like new ideas or like the methods yeah oh man uh okay so just to close the loop on this conversation this part of the thread is

The skills you're suggesting people focus on building and leaning into is soft skills like creativity, managing influence, collaboration, looking for patterns. Is that generally... where your mind is at yeah i'm thinking a lot about like how do we make organizations more effectively and i think this is mostly like management i guess it's like how do you organize like research teams or like generally teams like combine

composed teams such that they will be at their maximally succeed or like at the maximal like performance of what can possibly like if you can like literally create like the next generation of computers it's just like the matter of conviction and like the way you manage through that it's like scaling organizations or like scaling product research i guess Yeah, I think you're basically building this thing.

Not efficiently doing it is like limiting the potential of the human species right now. It's mismanagement within the research team and open ion anthropic and some of these other models. Yeah, it's kind of crazy to think about. Holy moly.

Okay, so speaking of Anthropic and OpenAI, you've worked at both. Very few people have worked at both companies and have seen how they operate. I'm curious just what you've noticed about the differences between these two, how they operate, how they think, how they approach stuff. What can you share along those lines?

more similar than different uh obviously there is a lot of like there are some like differences also comes to like nuances arctic culture I really love Anthropik and they have a lot of friends there and I also love OpenAI and I still have a lot of friends though so it's like it's not about like enemies I feel like there's like in the eye it was all like yeah the competitors and there's like enemies but it's actually like one big community and it's like

of people, like, doing the same thing. I would say what I've learned from Antarctica is this, like, real care and craft towards, like, model.

behavior, model cost, model training. And I've been thinking a lot about like, okay, like what makes Cloud Cloud and what makes Chachapiti Chachapiti? And it's like, I saw this comes down to like operational processes that... kind of leads to the outputs to to the model uh is the output model and it's like the reason why cloud has so much more personality and like uh is more like a librarian i don't know like

I don't know, I am like visualizing Claude being like a librarian, like a very like nerdy or something. It's because I feel like it's a reflection of the creators who like making this model and like a lot of like details around like the character and the personality and like whether the model should follow up on this question or like not like what's the correct like ethical behavior for the model to like in this scenario it's like a lot of like craft um

and created datasets. And this is where I learned that part of art, I guess. at Antarctic. I'd say, like, Antarctic is, like, much smaller. Like, when I joined, it was, like, what, like, 70 people. When I left, it was 700 people. And, like, obviously, the culture changed so much. I really enjoyed being, like, early days startup, like, wives.

and like people knew each other as a family but like the culture shifted I would say like I learned from Antarctic that like they're much better at like focusing and like prioritization like very very hard like very hardcore prioritization i guess and they need to do it like but i think like open the eyes look much more um innovative and much more like risk takers in terms of like product or like research actually you know like

I don't know, you can, like, your full-time job can be just, like, teaching the world how to be, like, creative writers. And it's, like, there's some luxury in this, like, research freedom that comes to scale, maybe? I don't know. But it gives you, it's like, you'll have, I feel like I have much more creative, like, product freedom to do almost anything, I guess, within, like, OpenAI, like, I've lost Chachapiti into, like, the version that you want. It's, like, more like...

Yeah, probably bottoms up, I guess. Yeah, that's how I was thinking about it. It feels like opening eyes more bottoms up, distributed people bubble up ideas, try stuff. There's more And that leads to more products launching. I imagine more things just kind of being tried versus more of a, let's just make sure everything we do is awesome and great and craft and thinking deeply about every investment. That's really interesting. I've never heard it described this way.

Karina, we've covered so much ground. This is going to help a lot of people with so many ways of thinking about where the future is going. Before we get to our very exciting lightning round, I'm curious if there's anything else that you think might be helpful to share or get into. One of my regrets, I guess, when I was early days at Antopperg was that, like, I think there was, like, some luxury of the time, pre-Chachipiki, to actually, like...

come in with a bunch of ideas and a prototype almost every day. And I think we did a lot of cool ideas. Cloud in Slack was actually one of the first tool-using products. Cloud could operate in your workplace now. It's kind of cool when you add Cloud to summarize the thread. so maybe you have an entire conversation with someone and then you want to like a summary of like what happened like you can say like at cloud summarize this

Also, it was really fun to, like, even, like, iterate on the model itself. It's like when you just, like, talk to the model in, like, Slack forever. It created, like, some social element. It's kind of cool. It's kind of like me, Jeremy, and, like, this Discord. Like, people learned so much about, like, prompting and, like, how to work with, like, clods.

Actually, one of the features that was like early tasks prototype was like, you know, every Monday, Claude would just like summarize the entire channel. or like every friday we just like summarize like a bunch of channels uh and give like the news about the organization or something. So it's kind of like really cool like form factor. I think I'm thinking about like form factors like. a really important like question like in AI especially we haven't like even figured out like how do we

create, like, an awesome, like, product experience with, like, O-series models. It's, like, the paradigm between, like, synchronous, real-time, give an answer paradigm into, like, more asynchronous paradigm of, like...

agents working on the background. But then now the question is, like, the agents should build trust with you, right? And trust builds over time, which is, like, with humans. And, you know, you start this collaboration, which is why, like, collaborate, like... this collaboration model was like you and the model is like so important because you both trust and the model learns from your preferences so that it can become like more personalized.

and it will start predicting the next like action that you want to take on the computer or something and it's like kind of like more predictive much more we went from like personal computers like personal model uh basically here Why is it not a thing? That seems like such an obvious feature that every LLM should have is a Slack bot version of them. Is that a thing I can have you install or is that not a thing right now?

i know that cloud and slug were sunsetted in like 2023 or something but that's because like i think I think it was, like, after ChachiPT, it was mostly, like, the focus on, like, consumer use cases or, like, enterprise use cases. I think we didn't want, like... I think the form factor of, like, cloud and Slack is, like...

It was kind of constrained a little bit when you wanted to develop new features. I want that. I know that JGP had Slack bar too, so I don't know. Maybe it will come back. All right. I would pay for that.

Any other memories from that time of early days? Because that's a really special place to have been as early days anthropic. Any other memories or stories from that time that might be interesting to share? I think the very first launch when it felt like... when click sim used the game was like a hundred key context launch is like when the models could input the entire like book and give you like a summary of the book or something

or the entire financial or like have like multi-files financial reports and then like give you an answer um to the question to very specific question I think there was something in there that kind of like, oh my god, this is like a really cool new capability, not like model capability, but more like the capabilities that came from the product form factor itself, rather than like...

the model capability as much um i think like other prototypes that we were thinking about like yeah like uh there's like one part of the clock workspaces and it's like kind of the same like idea like Claude and I would have this shared workspace, and that shared workspace is like a document, and we can, like, it's written in the document. And I feel like sometimes the ideas, like, private ideas lag, and they lag for, like, two years, just like in this case.

It's interesting there's these milestones that kind of open up our view of what is happening and where things are going. ChatGPT, I think, was the first of just like, wow, this is much better than I would have thought. You talked about...

100k context windows where you could upload a book and ask you questions and have it summarized. I actually use that all the time when I have interview guests and they wrote a book. I sometimes don't have time to read the whole book. So I use it to help me understand what the most interesting parts are. And then I actually dive into the book just to be clear.

And then, I don't know, maybe like Voice was another one where you could talk to, say, ChatGPT. Is there any other moments there that you're like, wow, this is much better than I thought it was going to be? Yeah, I think like... the computer use agents like the model operating the desktop and you can essentially think of like you know new kind of like experience where

the model can learn the way you browse and from that preference it can just like browse as just like you and it's kind of like simulation simulated persona and it's actually very similar to the idea of like okay like Maybe Sam Altman doesn't have a lot of like...

time maybe i want to like talk to like his simulated like his simulation and ask like or like for example like yeah like i i really appreciate some of the technical mentorship for like jakob like but he doesn't have a lot of time so it's like i really want like ask him these questions like how can you respond with simulated environments like this um would be really cool that's a great place to plug lennybot i have one of those it's trained on all of my podcasts and newsletters

And it sits on many models. I don't know which one exactly they use, but it's exactly that. And it's not even me. It's all the guests that have been on the podcast and on the newsletter that I wrote. And you could just ask it, how do I grow my product?

How do I develop a strategy? And it's actually shockingly good. Do you feel like it reflects who you are? The best part of it is you can talk to it. It's built. There's an 11 labs voice version that's trained on my voice now for this podcast. And it's actually. Very good. And people like have told me they sit there for hours talking to it. Wow. And somebody.

told it interview me like i am on lenny's podcast ask me questions about my career and he did a half hour podcast episode with oh my god that's so fun it's incredible future is wild yeah i think like Content transformation is like, you know, like I would imagine sometime like, you know, when you generate a sci-fi story.

in canvas like you can like transform this into like audiobook like where you have like very natural like content transformations like one media to another medium i think like one of my in Earl's inspiration is like one of the last episodes of like Westworld where I want to slow but where Dolores comes to her work at the time and she comes to like

this like new workspace and she starts like writing a story and then as she writes a story like a 3d like virtual reality starts like creating on the fly so i kind of want to He does. Kind of cool. Wow. Speaking of medium, I guess I was wondering if I should go in this direction or not, but real quick. Kevin Weil slash Kevin Wheel. I don't know exactly how to pronounce his last name. The CPO of OpenAI. Is it while or wheel? I think real. Wheel. Okay, okay. Let's just say that.

He did a panel at the Millennium Friends Summit last year, and he made this really fascinating point that chat is a really interesting interface for these tools because... They're just getting smarter and smarter, smarter and smarter and smarter. And Chad continues to work as a paradigm to interact with them. Similar to a human. You could talk to Albert Einstein. You could talk to someone not very smart. And it's all conversation still.

And so it's a really flexible way to interact with increasingly good intelligence. At some point, it'll not be so great. And you're talking about all these ways that you're adding additional ways to interact. But it's interesting chat proved to be a really powerful. layer on top of all the stuff.

Yeah, that's really cool. I feel like chat also has a social element, which is like very humane. It's like, you know, you sometimes want to like get into a group chat and like, you know, having conversations with the AI is kind of like a group chat in itself. idea like how do you build like features like this like I see tasks as like this like general kind of like

feature that will scale very nicely as the models would develop new capabilities themselves. The models will be able to do better searches and create new, come up with more creative writing on like render you know react apps and like html like apps and like you can have like every day a new puzzle for you like every day like continue the story from the few days it's like it scales very nicely

You mentioned something as we were getting into this extra section that we ended up going down is this idea of the agents using a computer. I know this is actually something you are going to launch today, the day we're recording it, which will be out by the time this comes out.

called Operator. Can you talk about this very cool feature that people will have access to? Yeah, so I unfortunately did not work on that, but I'm really, really excited about this launch. It's basically... imagine that can complete the task in its own like virtual computer like in its own virtual environment you can do any literally tasks like order me a book on amazon and then

Ideally, the model will either, like, follow up with you, like, which book do you want? Or, like, know you so well that it will, like, start recommending, like, oh, here's the five books that I might recommend to you to buy. And then, like, you... hit, like, yeah, help me by. And then the model goes off into its own, like, virtual little browser and, like...

complete the task and buy the book on the Amazon. And then if you give the model like credentials, credit cards, obviously it comes with like a lot of trust and like safety, then it will just complete. the thing for youth. It's a virtual assistant. It's interesting how this just sounds like obviously this should happen. Like, why is this not yet a thing? Which is also mind-blowing that we're just assuming this should exist. Like, just some...

AI doing things for you on a computer and you just ask it to do. Like, it's absurd. It's actually really hard. And I think like, um, you're still cracking this but I feel like I don't know if you use like tuple it's like a pair programming product no but um I don't know if you love pair programming. Oh, yeah. Shopify uses this. I remember it came up on a podcast episode. Oh, nice. Yeah, so it's a very cool product where you can just call anyone.

at any time and then like share screen and the other person can like have access to the screen and like start like literally operating your computer and it's very like real time like the allegiance is like very um it's like very high quality um and it's just like i kind of want the same it's like i want to like pair program with like my model and like the model should

you won't talk to me, like draw like very specific like section in my code and VS code and like tell me like, I will teach me and you can have like different modes. It's like right here, this is like a product right here for you. I don't know. Some people should build up. It sounds like a startup just got birthed from someone listening to this. You mentioned that it's very hard to do this agent controlling a computer as you and helping out. What makes it so hard?

for whatever, however much you can explain briefly. Much of it is like, because right now the models operating on like pixels, instead of like... language or whatnot like pixels is actually really really hard for the models because like perception of visual perception i think there's still like a lot of like multimodal like research that's going on

um but i think like language scaled so much like easier compared to like multi-models because of that another like thing that i just like my team is like how do you derive human intent um very correctly it's like sometimes like Does the model know enough information to ask a follow-up question or to complete the task? You kind of don't want an agent to go off for 10 minutes and then come back with an answer that you didn't even want. That actually creates much...

more worth user experience. And this comes with teaching the model people skills. It's like, what do people like? Kind of like creating the mental model of... the user and like care about the user in order to ask certain questions like actually that part is like hard too for the models that relates to what we talked about earlier where there's kind of the soft skill people skills pieces

Not where these models are strong yet. Okay. I'm going to skip the lightning round. I want to ask just one question from the lightning round. Something fun. Okay. So when AI replaces your job, Karina. I'm curious what you're, and it gives you a stipend, gives you a monthly stipend. Here's your salary for the month. What would you want to do? What do you want to spend your time on? What will you be doing in this future world? I've been thinking about this.

A lot of times I feel like I have a lot of jobs options. I would love to be a writer, I think. I think that would be super cool. Just to write short stories, sci-fi stories. Novels. I really like art history. You know those conservationists in the museums who just try to preserve art paintings by painting through? a lot of days i think that'd be really cool um to do um yeah that sounds beautiful i don't know

What I'm hearing is you need to nerf these models to not get very good at writing so that you can continue. Although at that point you don't need to do it for like, you don't need people to buy it. You're just doing it for fun. So it doesn't even matter if they're incredibly good at writing or art, art conservation.

Oh man, what an episode of our conversation. What a wild time we're living in. Karina, thank you so much for being here. Two final questions. Where can folks find you online if they want to reach out and follow up on anything? And how can listeners be useful to you? You can find me. I'm on Twitter. You can also shoot me an email on my website.

My team is hiring, and so I'm looking for research engineers, research scientists, as well as machine learning engineers, people who come from part of engineers who want to learn model training. I'm actually hiring for... My team, my team is called Frontier Product Research. And the train models, we develop new methods, but for product-oriented outcomes. What a place to work. Holy moly. What's the best way for people to...

apply for these very lucrative roles? I think you can shoot me a DM on Twitter, or I'm yet to create a job description. Okay, this is the job description. Or you can apply under post-training team. Okay, you're going to get a flood of DMs. I hope you're prepared. Karina, thank you so much for being here. This was incredible. Thank you so much, Lenny. Bye, everyone.

Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at Lenny's podcast dot com. See you in the next episode.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.