Episode 7 - Live from DevDay | OpenAI Podcast - Listen or read transcript on Metacast

⁠¶ Intro / Opening

00:00

Teachers and school leaders realizing, hey, this helps me in my job. We see doctors saving as much as an hour or more a day. Vibe coding is this idea of it's never been easier to create prototypes. I think we just saw a whole new way of experiencing the web. Hello and welcome to the OpenAI podcast where we're live from OpenAI Dev Day. Here sitting with me from school.

⁠¶ Caleb Hicks (SchoolAI)

00:25

AI is Caleb Hicks. Caleb, hello. Hi, thanks for having me. This will be fun. So Caleb, you are working on tools for helping educators and helping people basically in the classroom understand progress of students. That's right. Yeah. So first off, what was your reaction so far to Dev Day? A ton of fun. I think makes it a lot of things to be excited about that help us build, but also help students and teachers be more creative as well. So that'll be fun.

00:52

So what have you been working on over the last year? What has changed with AI that's accelerated what you've been doing? I think probably the biggest advancement over the last year for us. So we put AI in students' hands. That's the main... uh the main thing that we focus on is safe managed uh ai that can act as kind of one-time personal tutors for students and so probably the biggest change

01:18

from OpenAI has been model progression. I think we get two advantages from that. One is significant leaps in intelligence. And the other one is, you know, improvements in cost. because we are working with an industry that isn't known for paying big dollars for software. It's been important for us to be able to manage students using this in a cost-effective way. So those have been the two areas. that AI progression has helped.

01:47

From our work, it has been a lot of orchestration, which I'm sure we'll talk about a little bit, just getting different AI agents and models to work together for the best outputs for students in particular. So a couple of the releases we saw today, one was the agent's SDK, you know, and you've talked about that. How much have one tools changed the ability of one to work faster into the scope of what you find is capable now?

02:15

Yeah, I think we're seeing teams across industries work way faster and building better software because they've got kind of this always on expert, hyper senior engineer next to them that they're pair programming with. Right. And so we see that with our teams as well. And that just allows us to to build better software faster and get it in the hands of teachers and students, which is.

02:42

what we're here to do. What has been the biggest shift you've seen in talking to educators or people like that in regards to AI in general? Yeah, great question. So every teacher, school, and district is on a very similar journey. It starts with permission. right two and a half years ago it was everyone under the sun was just banning ai all together uh we've we've moved past that into productivity rising hey this helps me in my job uh the leading schools are starting to get into

03:11

that really important spot, which is recognizing that every student has to know how to use this stuff. If you're going into, if you're graduating from high school and you're competing for colleges or jobs and you. don't know how to use AI yourself, you're at a severe disadvantage.

03:31

Most people are now orienting to like, yeah, we have to teach this. We think there's a couple of special steps beyond those two where you get to better support students, the AI tutor in every kid's pocket. Right. But we think that is it's got to be. classroom connected. It's got to know what you're doing in class and it's got to know where you're trying to go and help you move that direction. That last step we get really excited about is how you can put AI to work.

04:00

with teachers, families at home, school leaders, kind of the system at large to really make school awesome for students. Could you tell me a little bit about the kind of stack you're using from both the teacher-facing, student-facing, backend, or what you're working on in that regard? Yeah, from the product side, we have kind of three different... parts of the product that

04:25

students, teachers, and school leaders use. So there's just a pretty basic AI assistant, right? The GPT wrapper, as they say, but tuned to use cases. schools. I think a big thing that we felt like was really important is teachers should never have to become prompt engineers. Right.

04:47

uh so you know i was a prompt engineer yes yeah so so we do a lot of that uh kind of extra orchestration uh to we essentially enrich every prompt that that a teacher writes to get better output for them for their what teaching their grade level all that stuff so there's there's that

05:06

We call that dot. It's fun little blues animated character. And then we have tools. It's a form. You fill it out and it gives you an output, a lesson plan, adapted reading content, things like that. Those we like to call those the 101.

05:20

like the the checkers level features that kind of table stakes that you've got to give to teachers for them to move from can I even allow this to this is useful for me but the special part is when you start doing kind of these one-time, guard-railed, safe, managed AI tutors that the teacher can create, give to their students, students are interacting, and then the teacher gets a real-time dashboard of how the students are doing.

05:49

what they're doing with the AI. And to make that very concrete, the last five, 10 minutes of class, a teacher may give what's called an exit ticket. to their students. So they've got a recording of everything that they did during class that day. And it loads up and says like, hey, how did the content go today? And it does almost a what's called a

06:11

formative quiz. It asks them questions and then it's coaching them on what they learned, what they want to learn, where they might go next, tease them up for whatever homework they might have. And then it will do just kind of a social-emotional check-in. It will say, like, hey, how was class? What feedback do you have for the teacher? What are you looking for out of using this content? And rolling all of that up to the teacher so that they know how to better support those students in the future.

06:43

I think a lot of us don't recognize about teachers is they might be working with 300 students at a time. I had 42 desks in my classroom. So you started off as a teacher before all of this, that informed a lot of what you're doing. Yeah. So I had 42 desks in my classroom and I was teaching seven or eight periods a day. Every day, teachers have to make this impossible choice. Do I work with...

07:05

the top 10% of students who love this. They get it. They want more of it. They'd stay after school if I let them. the 10 percent of students that are really struggling maybe that's because they don't understand it maybe it's a learning disability maybe it's a problem at home maybe they got bullied in the last class or everyone else And I care a lot about that everyone else, that middle 80%, because I was one of those students, and most of us were, definitionally.

07:34

We think about what we're building and what we've been able to build with OpenAI and some of the tools that we're announced today we'll be able to do even more of is give teachers almost a GPS for impact. Like these four students really need you today. And you can jump in and support those four students in a way that you maybe wouldn't have even known that they had a concern.

07:57

That brings up a very, I think, interesting point. When I talk to developers, often people confuse the tools for the product. And what you have to do is you have to both understand the needs of the educator. That comes from you working in the classroom and your peers working at School AI and understanding that.

08:11

think that's the thing you're able to bring to it where you look at this this is a platform to build on top of it and that's something where you've identified all these areas that you can bring into to specialize it and make it very custom what you're doing yeah I think that's something exciting we saw today in the announcements was opportunities for... uh people like me with subject matter expertise with that really know a domain and we're able to build kind of with

08:40

you all with OpenAI, not just on top of OpenAI, which is a really cool unlock for a lot of people. What are you most excited about that you saw today? The agent builder. Agent builder, yeah. Yeah, that looks like... It'll give you a lot of fun. And going back to when, you know, you know how it was when you first started, you had to just wire up a lot of that yourself. And code tools certainly make that easier.

09:03

But just being able to drag and drop, think something like file search, the permission structure seemed really well thought out, and particularly what you're doing in a classroom situation where you really have to have those safeguards in there. Yeah. We actually built our own. And have for a number of years. And I think we're excited to go and get our hands on it and see what do we get to get rid of? What do we get to make even better now that we have access to it ourselves?

09:28

um or or from you all i think that's uh either way we're going to learn a lot from working with it and and that makes it easier for us to use but also again bring it into the hands of these people that aren't technical. They're not developers. They're not thinking about which model they're using, but they just want to get their stuff done. Yeah, I think that's a great point in that the more you can have the people who understand.

09:54

who your customers in this case students and teachers understand what that problem is in less time they're spending time to wire things up and run you know fast api servers and stuff to do this seems like it can make the product get a lot better faster absolutely What else have you been excited about that you've seen? I think...

10:13

We've, we've built some similar things and in our product, we have this concept called power ups, which are basically apps that you can use with dot, our, our character. I think we got to see some good patterns and examples of that today with the new apps that were, that were announced. I think one of the things we're really excited about is these partners that we've been developing, that everyone is really solidified around MCP servers as a way to communicate with AI.

10:42

um a thing that will definitely benefit us is open ai kind of drawing a line in the sand saying we're doubling down on this yeah uh and so now when we go to a partner that's already building for integration with chat gvt they can also bring that directly into school ai again for that safe managed guard railed experience that that teachers and school leaders are looking for yeah one of the things i think it's going to be helpful too and you mentioned this before too is evals

11:10

and the ability that as you run these systems and you run the models to be able to know how well they're performing because 2% or 3% may not seem like a lot, but it can make all the difference, particularly with a student. When you have 5 million students using your platform, 2% to 3% means a whole ton of issues every day. And it's one of those things that's often, every company I talk to knows it's important, they want to do that, but they have the...

11:34

the opportunity and the time to spend building an eval suite, to build something in there is just often kind of a thing that we'll do in the future, but having that built into a system seems like a bit. Yeah, we kind of saw it today, right? I don't have time to do the evals. Yeah, exactly. So they did an eight minute. Eight minutes building an agent. Very impressive. Yeah. Take 10 and do the evals and that'll be fun. Yeah. And that's, it's kind of exciting too, because for.

11:58

prototyping and spinning things out because i think a lot of what you're doing is probably going to be experimental and trying to think with teachers to be able to sort of see if these things work and do you see that accelerating yeah so we um one one of the things we do teachers when they're creating these like

12:13

custom AI tutors, they do them lesson by lesson. It's like, hey, I'm teaching about the water cycle and I'm going to create this activity that starts out as a tutor and then turns into a game and then turns into a quiz. Just being able to preview that fast. but almost having adaptive evals in the moment is one of the things that we've gotten into.

12:33

Really the meta prompting. How do we, how do we not just tell the AI what to do, but tell it how to do the thing that it decides to do. Right. Right. Yeah. It's a, it's a very interesting space to be in where the AI can help you build the prompt to tell it what to do, but knowing where you're going to start from has been.

12:48

Super helpful. Where can we look forward to finding out more about School AI, what you're working on? Yeah, you can follow us on X at Get School AI. You can check out schoolai.com. I think this is a fun time of year with back to school and seeing a ton of how teachers and students are building different holiday stuff that gets really fun when teachers are sharing with each other. So, yeah, I would say on.

13:18

uh on x uh or instagram or schoolai.com last question any advice you give to developers about where to start with the tools what you've played with so far oh i think um you can probably just start in the in like the gbt builder for most of the ideas that i hear is just start with the gbt builder and then expand from there um you've got uh you know in the developer portal you've got a ton of other tools to just start playing

13:45

And then I think the agent builder that we saw today is going to be another fun one to start connecting the dots between all the different tools. Awesome. Thank you so much. Yeah, thank you. Caleb, hope you enjoy the rest of Dev Day.

13:57

So we're going to be talking to a few other developers about some of their different experiences. And I'm always fascinated to see what... it gets accelerated and how they're able to focus when a new tool comes out on really what is their core thing they're working on so up next we have danny grant who's with jam.dev which is an in browser tool for helping you evaluate your site and to figure out basically how to improve it

⁠¶ Dani Grant (Jam.dev)

14:18

is that correct i'm so happy to be here and actually later today on stage we're going to announce a brand new tool oh boy it lets Any PM, designer, marketer. Oh, I'm a little short, aren't I? No, no, no. Micron's too tall. It's all good. I agree. Well, just like you just did, it helps anyone fix what's broken instantly without writing code. It's called Please Fix.

14:40

Okay, so I would use this like my website go into it and say Just please fix or yeah, I mean like Yesterday, before this tool was announced, if you wanted to change some copy on your site or a button didn't look good or missed a hover state, you'd have to probably ask an engineer to do that for you.

15:01

Nobody wants to do that. And then they'd be like, can you make a ticket? And then the ticket gets prioritized and then like maybe they get to it. I worked for a company, I won't name it, where we joked it was easier to release a model than put up a blog post. that's hilarious yeah we'll name it um

15:17

Well, so now, as of today, what you can do is, while you're looking at your site, you click on the Please Fix Browser Extension, and then it lets you edit your site right there, like it's a Google Doc, like it's a Figma. And then when you're ready and you like how it looks, you just click. submit and it creates the PR for you and it uses your design system so engineers like the PRs they're very clean they're very tight

15:38

Very cool. What were you most excited about what you saw today? I mean, I think we just saw a new way to browse the web. Like, I think OpenAI maybe just changed what we mean by the web, what we mean by a browser. Like, if you think, like, Web 1 is... Read, Web 2 is read-write. Okay, there's web 3 and then maybe this is like web 4 like read write think I think we just saw a whole new way of experiencing the web where it's a lot less mechanical and it's a lot more stream of consciousness

16:07

So you talk about the apps inside ChatGPT. It's freaking cool. Yeah, it is. It makes you think a lot about when you're building something, what the core functionality is, like what the purpose of it is, and the idea that if it's going to be presented inside ChatGPT. And so watching the demo where they interacted with the Zillow website and actually having to have data.

16:26

about that and be able to drill down into it. And it's exciting to think about the possibilities there. And so I could see for you all, that seems like a pretty interesting area because people as they focus on usability, they're gonna wanna make sure those things work really well. Yeah, like now when you build your

16:41

inside of ChatGPT app. If you're the PM, if you're the designer, and you want to tweak some things and make the app look a little nicer, you can just use our browser extension, change it from the ChatGPT interface, and make the PR in GitHub. It's so cool.

16:55

You know, these tiny tweaks, they often don't get prioritized today, but they should. Because the difference between a fine design product and a well-designed product is that the well-designed product changes the world. Like, the iPhone changes the world because it's usable. And there were actually attempts before that.

17:10

that I won't name, that many people under the age of 30 have never heard of, because they just, they didn't have that attention to detail. Have you seen your development process change with tools that have been coming out, particularly AI tools? Yeah, I mean, I like I'm biased. We've been using PleaseFix as we've been building PleaseFix and it.

17:30

Like, even today, our pricing page changed dramatically because a PM could just go in and test a bunch of stuff without asking an engineer. I think the thing I'm most excited about is this idea that you can move as fast as your entire creative team altogether. Like, no disruption. for engineers. It's like people can move without having to have bottlenecks on a few people.

17:52

So it seems kind of cool because what you're talking about is the idea that people who aren't specifically engineer background are able to make those changes. And has that been something you've seen? basically within your company, the adoption of tools like Vibequid and stuff like this for people to contribute and be able to come up with ideas? Yeah, it's really cool.

18:08

when we we talk to our users every day and some of the stories we hear are awesome like last week we talked to a user who's a firefighter and building software for firefighters that's cool um talked to a user a couple weeks ago who grew up in the church system and is building software for churches

18:23

has no software experience, but is able to now make something that's very impactful to their community. I think that is awesome. I think we are about to see the Cambrian explosion of software, the same way that with the web, there's the Cambrian explosion of news sources with Substack and Twitter.

18:37

I think this is about to happen for software, and I think it's one of the best things for humanity. So how do things work at Jam.dev as far as ideation, testing things with customers, releasing, and then figuring out what's best features and whatnot? The only thing we care about is

18:54

does this deliver a wow for our users when they use it? And so that's what we're focused on. There's an emotional response we want people to have when they use our product because we work on the worst part of software development. Like, I don't know anyone who is like, today I want to fix some bugs.

19:08

Right. And so our job is just to make that whole part of the software experience a lot better. And so we're laser focused on that. We're just constantly talking to users. Every user that signs up from Jam hears from a co-founder. Every user who uses Jam hears from a PM. We're just in constant contact.

19:23

When I worked at OpenAI, one of the most exciting things, and it sounds absolutely silly today, was when we watched GPT-3 spit out a React button. That was like, oh my gosh, I could do it. I remember that. I don't remember it inside, but I remember it from the outside. I remember seeing demos of that being like...

19:37

The world just changed. That was the threshold for us being impressed was literally like four lines of code. And to be able to, oh, that's really cool. I can do that. What has been a big aha moment for you? I think. Look, it's similar to what you're saying, but when you think of what you just described, coupled with what was just announced with the apps SDK, you can imagine that in the future, yes, humans are going to be building a lot of apps that are shown dynamically as they browse the web.

20:03

But also... I think that agents will dynamically build apps for you as you browse the web. And that opens up a whole world of possibilities because you can have dynamic software right there. Like imagine inside of an organization. Right now, if a PM wants to see how a product is doing, have to build a dashboard and it's a lot faster to do that today than even six months ago but imagine the PM needs a dashboard and they're in chat GPT and chat GPT just

20:28

gives it to them and no human had to do that it means that there are two types of software there's sort of long-term software that humans are going to work on they're going to like really fine-tune make it great like zillow canva And then there's going to be sort of disposable one-time use software that an agent can just whip up. And that is just freaking cool. Yeah, it's a thing where...

20:48

you know as a developer you're used to sometimes spinning up a tool one time to use that and then you're done with it but it's another thing to think about that could be a new modality you know we get kind of fixated in sort of like the app store sort of idea and i think that there's going to be like you said long-term stuff that you deploy inside at GPT, but then the ability for somebody just to spin up a thing they're only going to use once, and that's the end of it. Yeah. And have you seen...

21:09

As far as things that sort of start off as hobby projects or things like that, or maybe inside jam.dev, people will say, hey, I had this idea I wanted to solve for this thing. Maybe even something from finance or comms or something that's turned out to be something useful. Yesterday we were at the rehearsal for our 10x code. session it's for startups we're all demoing and we're just sitting around kind of backstage talking and we're like oh how do you start your startup and almost everyone

21:32

three out of four, I'm the only one, where their startup started as an internal tool at their company that they needed. And actually these startups pivoted to the internal tool because they found it so tremendously valuable to how they build software.

21:46

What's, you know, how Slack got its start and other companies. It's a very interesting thing where the thing you spend the most time on is often the thing that's going to be the best product. And, you know, one of the things they talked about opening eye on here today was like how 70% of the codes come.

22:00

coming from, you know, the PRs are coming from, you know, generated by codex. And it's a very interesting thing to see when a company is using their own tool to build the tool, it seems to iterate much faster. It's actually so funny because if you listen to standard startup advice, like what would Paul Graham tell you?

22:14

hey, don't optimize the internal processes. Like just do things that don't scale, do them poorly. But actually it's when you take a lot of care in your own processes that you can develop these products that can be their own companies and can help a lot of people. Hey, it might be that that is going to be the advantage.

22:29

is how quickly internally you make something because I think you're going to see a leveling effect with tools like the agent kit and everything else because it's going to be probably having a big technical bench isn't going to matter as much as having good. product depth or understanding of your customer yeah at the end of the day

22:45

A human has to use the software. And if a human has to use it, it has to be easy for the human to use it. And I do think design still today and maybe forever makes or makes a difference. So you use jam.dev on the jam.dev site, don't you? Yeah, yeah. And if you want to see what's brand new, go to jam.dev slash please. fix okay how often do you guys making updates on your own site um right now we have you did your pricing page you did that so yeah

23:08

Too often. Does that become sort of thing like, well, if it's so easy to change it. Yeah, we just added dark mode. We added pricing page. We do copy updates. Yeah, now it's too easy. Is there going to be like a please fix wait?

23:21

you know oh please fix with a polite delay yeah or like maybe think about this like you know yeah we should add as a feature for engineers that if you made too many changes it like arbitrarily waits a few days and then asks for change again do you really want this yeah yeah what

23:35

What are you looking forward to? What kind of tools would you like to see that don't exist yet? Well, I was talking to the engineer sitting around me during the keynote, and they were really excited about the optimizer and the evals as part of the agent kit. And what they wanted to see was, well, like, what if

23:55

the evals could be written automatically for you using your own data and then use that to automatically optimize your prompt. So rather than people sort of sweating the details about these agents, what if the agents could improve themselves? And I think if we had that at Jam...

24:08

Like we could move a lot faster and we can make a lot more powerful software a lot sooner. What advice do you have to founders or developers right now? It's never been a more fun time to build. I think we all just get to enjoy it. So how did you figure out that this was your team wanted to work on with Jam.dev? How did you decide this was the problem you wanted to solve? When my co-founder and I were product managers together at Cloudflare, we...

24:32

We worked on the fastest moving team in the company. It's this sort of skunk works team that would try to do big new things. It's the team that shipped Cloudflare workers, their cloud compute platform. It shipped 1.1.1.1, which now fields a trillion.

24:47

dns queries a month um and it was just trying to move fast on this team that we realized a lot of the frustrations and bottlenecks come from just reproducing issues and there's no tooling out there and no user no pm no one sales knows how to communicate to an engineer what the engineer needs to know and we're like we can't believe that there's no way to help the engineer

25:09

get things done faster we're spending all this like great brain time on communicating bugs and hopping in calls and screen sharing and and not enough on fixing them and we thought that that that we can solve it seems uh like a lot of things sort of in front of you as far as what you can do. How do you decide what you're going to do next? Okay, I think there's something that we under-talk about in like startup world, which is...

25:37

If your startup works out to your wildest dreams, you're going to work on it for like 10 years. And that's a really long time. And so you better just love the problem because that makes it really, really fun. And so I think you work on the thing where if you get to wake up every day, work on it.

25:52

it and talk to users of the thing, you're going to be pretty darn happy about that. That's pretty awesome. How can people get started at jam.dev? jam.dev slash please fix and we'll please fix some code for you. Awesome. Thank you so much. Thank you.

26:07

So it's been interesting to see what each kind of developer what they're looking forward to what kind of Excites them as far as that a lot that's based upon what they've been working on I know a lot of devs have to build a lot of internal tooling and when you can solve for that with an SDK It makes your life a lot easier because you can focus on the thing you want to work with

⁠¶ Zach Lipton (Abridge)

26:25

Up next, we have Zach Lipton, who is with Abridge. How are you doing, Zach? Not too bad. How are you doing? Fantastic. So Abridge, you're working on tools for helping the medical community, people doing like transcription and that kind of area. Is that? Or how would you describe it? It's an AI platform for doctor-patient conversations. Okay. Like if we're to like rewind to like the state of affairs, you know.

26:45

post-adoption of electronic health records, but pre-ambient listening, doctors were spending about two hours doing paperwork for every one hour of direct patient care. So there was this kind of like clerical... burden crisis. It was a situation where technology was pulling doctors away from patients rather than bringing them closer. What we do is we provide this platform that kind of gives doctors superpowers, helps them with their paperwork, does all the note taking.

27:13

in the background, preps and everything. So all these kind of documentation artifacts are ready for them the moment the visit's over and exactly the form they need. So it could be fully present with the patient instead of spending all their time staring at the computer. What kind of metrics do you have so far as far as like time?

27:27

aid for doctors. Sure. Interesting story and difficult to track down in part because there's a time that doctors spend documenting during the day. But the reality of the status quo before was that most doctors actually didn't finish their notes during the work day. So they were home after hours logging into the EHR and doing what we called pajama time. They're basically sitting there like pulled away from dinner with their families or logged in after hours, you know, sitting in.

27:56

sitting in bed finishing up their notes. Our reports, we cobble together this information from a bunch of different sources, but we see doctors saving as much as an hour or more a day. We see doctors, some doctors saying like . 10-15 patients in a day we're talking about like you know five to ten minutes often and note-taking so it's a it's a tremendous um kind of relief but even beyond the um

28:22

actual time saves, oftentimes there's an even larger sort of like perception of burden lift. And that's because the doctor is worrying about less and able to focus on their patient. Yeah. I mean, that's great. Cause like you.

28:33

you know look at with an hour a day is either an hour they can spend patients or just not have burnout and you know have more focus on that you know that's it's we've talked before school.ai and looking at how they're basically able to help teachers spend more time in the classroom with students and that's the same problem you're dealing with here how can doctors be there for students or patients and also

28:52

Yeah, so we have a channel in Slack and it's what we call love stories and it's where we hear like kind of like all the feedback from the field from doctors and one of the one of the wild indicators that like we had landed on something big and this is relatively early.

29:05

um maybe like a couple months after we launched the first like enterprise pilot at a hospital system was when we started getting stories coming in and they weren't just talking about the clinical experience but they were talking about like uh i i spent uh actually got to have dinner with my family

29:20

every night this week for the first time in like 10 years or like a bridge is saving my marriage and that was that was not like where i was expecting things to land so quickly but uh you know that the problem is that big what did you see today dev day that has you excited Oh, so many things. Maybe like two that are top of mind. One, a lot of people have already talked about the agent developer kit.

29:43

And I think that's just extremely exciting. There's this moment right now where I think everyone, it's sort of a proto-discipline. So everyone's been rolling their own tools in the hope of like kind of trying to figure out like what is the paradigm? How does this work? And there's so many things that have to work together.

29:58

There's the context engineering, there's the prototyping, there's the sanity checking, there's evaluation. There's also down the road, you know, everything around production and monitoring. So I'm really excited to see where these tools ultimately go, but seeing OpenAI.

30:12

like take a strong position and put a kind of comprehensive offering that brings together a lot of these things that, you know, people have been rolling their own orchestration tools, their own evaluation platforms. We certainly have. And seeing how much, how much is. to create a common platform and allow us to sort of

30:29

lift off and focus more on the content. It's something I'm super excited about. And then in general, I've just been extremely excited and drawn a lot of inspiration from all the work that's gone on in terms of software developer tooling. And so we are. we are developing AI-powered products for our customers, but we're also big consumers as productivity tools like Codex play a big role for us. And just seeing how far we've come there. I mean, I remember...

30:58

backgrounds in academia. I was an AI researcher. I am an AI researcher, a professor at Carnegie Mellon. And I did my PhD back in 2010. And I remember when Ilya had a paper, Ilya and Wojciech had a paper called Learning to Execute. And it was just having...

31:12

like, you know, it's the idea of like code going in and like a model kind of anticipating output was like such a, the fact that the model is doing anything at all in the space of code was kind of revolutionary. And to see where we are now going from like maybe two years. years ago, having our minds blown by just code completion. And right now seeing these like larger, like full code based refactors taking place. You know, it's it's kind of amazing to see the progress in the space and.

31:40

I've been excited to follow along. So when you work in medical, it's a very high stakes area. And so that's got to be something you think a lot about, about how you deal with hallucination and also how you deal with customer concerns about these things. 100%. So what do you look forward to in tools? What have you seen the biggest help in that area? That's an area where we've had to develop a lot of our own technology.

32:02

You know, what is a hallucination? It's kind of like back in the old days of like developing simple classifiers. We had we had false negatives and false positives. And now kind of like everything that's like if it's there and you don't want it, it's a hallucination. And the question is, well, what is a hallucination? Sometimes it's.

32:15

completely confabulated information that is asserting facts about the world that are not real but in the context of medical note-taking medical documentation order placement it's there's a kind of particular situated notion of what we really mean it's something that's sort of unlicensed you know by the sort of

32:35

surrounding context, the substantiating evidence, even if it might be true. You know, if if a doctor doesn't if a kind of explanation of a disease like shows up in a generated patient facing summary that. the doctor never said yeah you know that's that's kind of like bad even if some of the information might be like either correct or plausible like that's not within hours so we have a kind of bespoke notion of what constitutes a hallucination but we you're able to like kind of

33:07

often draw a lot of inspiration from what can like the frontier models already do out of the box like if we define our ontology of like these are the types of errors we're concerned with then we can go and say what is the ability of an out-of-the-box model like each each documentation sort of sentence a la carte to correctly designate them as blowing to the right category and we find okay we're already within the realm of like the models able to judge even if even if it's not able to

33:35

sort of never commit the crime in the first place. It's able to recognize when the crime has been committed. That gives us a sort of like proof of concept. And now what we need to do is make it better, more accurate, cheaper, faster. So ultimately create our own special purpose models that are able to. take in parallel every single sentence in all the generated documentation and surrounding artifacts and process for each one.

33:57

like sort of does it contain an error of of a unacceptable variety like of of what kind and then a kind of pipeline downstream for remediating and we're able to do that um with about 97 recall at this point

34:10

So do you have any advice for people who are trying to work on basically developing their own evals for hallucination or just sort of a good starting point? I think it all starts with getting really crisp about what you really mean. And I think that's what we've seen before is that what is a hallucination?

34:24

for us is a little bit different from what is a hallucination for like a general sort of like open world QA system. So is there kind of like a. boundary which you keep expanding for instance we talk about medical say okay we feel maybe working inscribing right now is an area that we can probably solve for and produce a pretty good product that's at or better than human level then do you look at like there's areas which you would expand out to as you feel more confident

34:50

Absolutely. So for us, we've always bristled a little bit when, you know, like VCs put up this chart and they're like, these guys over here are the conversation agents and these guys here are the coding startups and these guys are the scribing companies. And we've never liked getting pitches.

35:04

hold as a scribing company that's because from the very outset um the central thesis wasn't just about scribing central thesis was sort of um about medical conversations about this being this moment where um

35:17

You know, this is this magical spot. It's these 15 minutes that the patient waited maybe six months for are where the patient tells their entire story, where the doctor goes through their entire reasoning process. And within minutes after it's over, the patient's forgotten 80% of what happened. The doctor is like, you know, hours behind on their note taking. And so for us, we've had this feeling that, you know, I've also like.

35:39

as an academic been working in actually applications in healthcare is like my kind of passion area for about a decade. And I've been watching so many interesting machine learning ideas get developed as a proof of concept. only to only to kind of sit on the floor not to use and so what we realized that the conversation was was this way in it was this important arena where it was in some ways it's the most important moment in the entire experience of healthcare and sort of

36:05

no one was providing value in that moment and scribing we already knew was going to be a killer application because you know it was never gonna scale human scribing was never gonna scale to all doctors but those who could afford it were willing to pay tens of thousands of dollars per doctor year to have a sort of like offshore scribe and so that already kind of told us like there was this clamor for it there was this need for it but we could get in but now that we're in um i i'd view

36:31

like take this broader view of like what what is like the entire picture rather than sort of being like we're quietly in the background just like minding our own business and then at the very end of the visit boom the magic happens we don't want to go so far in the other direction that we become interruptive but from a perspective of just sort of like how do we support a doctor

36:50

for the entirety of the visit from sort of their pre-charting experience, you know, before the patient even comes in the room through sort of every kind of cue or nudge that they might need during the visit to help them make the best possible decisions, help them. tick all the boxes to make sure that insurance is going to pre-approve that particular test or treatment that they're going to do so the patient doesn't end up staying a month waiting for care. So we kind of like zoom out.

37:18

out, we kind of see this space of the conversation, like the point of care is like now that we have ears in the visit and now that you have a sort of AI workforce to sort of do your bidding, what are all these other jobs to be done that could be addressed in the moment?

37:32

And that includes everything from the sort of pre-visit experience through to real-time clinical decision support, through all the kind of anticipating and getting in front of all of the... uh sort of like financial related documentation that needs to be done to ensure that that the doctor gets paid and the patient gets their care in a timely fashion yeah you know on one end you have people working on you know the ai scientists and tools and trying to solve frontier

38:00

problems. I've talked to other people who told me that if you can have better intake in hospitals, you might be able to get rid of hepatitis, that there's some very low hanging fruits there. And is that something you've looked into or you sort of see a lot of opportunity there? Absolutely. Any particular area you'd like to see the tools get better sooner at? Oh. So many. I'd say on a personal note, they're not very funny yet.

38:25

So I don't know if you've had this experience with ChatGPT that it's far better at solving hard math problems than it is making a joke. Sora, though, is very good at jokes. I don't know if you've tried that yet. Yeah. I think...

38:40

there's a tremendous amount of work that one still has to do to like, um, crisply define every single task for the model. And I think that like, Just how high models can come and I think you know like there when it comes like a crisply defined technical task They're they've gotten very high in the abstraction chain about breaking it down um but i think when you get outside you know and start like addressing problems that like interacting with the system kind of like tackling the like the

39:15

the more like world problem you're discussing. You find that like you kind of have to do all the driving. Right. And the system is a little bit more of a, it's an information retrieval system. It is the world's knowledge at your fingertips, but it is not. kind of connecting dots at a more abstract level and so i think you know in the in the coming you know months and years i'm excited to see uh to what extent does the model go from

39:42

more technical problem solver to a more independent interlocutor and the normative side of problem solving. Going back, how did you all decide this was the space you wanted to start with? You know, I think we saw a few trends that were happening all at once. At once, like my research background was in deep learning.

40:08

any one given approach we kept running into little plateaus here and there but if you zoom back and like look at the arc from 2012 through to you know, maybe 2018-19 when the company was founded, you saw there was a path of advances in speech recognition and advances in natural language processing that was, you know, proceeding, if anything, accelerating. And simultaneously, there was a crisis around physician burnout that maybe in 2018 wasn't the like number one.

40:42

burning priority on the minds of like CMIOs and like hospital system CFOs across the country. but it was you know it might have been number five on their priorities and it was like rising up the charts and so we kind of knew there was this coming crisis of like

40:58

Doctors were burning out. They were spending more and more time on documentation. They were dropping out of med school. They were graduating med school with no intention to practice medicine, leaving to join tech companies, to join pharma, but to do anything but practice. And so there was a churn problem or a tension problem. And at the same time, we kind of knew that the right family of tools were coming into fruition simultaneously.

41:23

And, you know, for us, I think me coming from, I was saying before, this kind of academic perspective, having, you know, a lot of us before had maybe operated in the machine learning for healthcare community, a little bit on like, what feels like a cool or important predictive problem.

41:42

but without maybe connecting all the dots when it came to what were the priorities of the health system, what were the pain points of the health system, what were really the choke points in care, maybe coming a little bit more from what seems like an interesting clinical predictive problem. And I think at that moment, we had a little bit of flash of insight that we didn't know if our timing would be right. Keep in mind that in 2018, the typical context length for a language model was maybe

42:07

I don't know, 256 words and these conversations are like four to eight thousand words like a median. But I think we just saw a lot of those convergence of like a few trends that all spell that there was there was this real opportunity.

42:22

to um to save time for doctors and ultimately hopefully you know save money and also save lives so in an area like medicine which is very high stakes what advice do you have for developers that are trying to build trust with their product because is you know better than anybody that there's been kind of a road of broken promises of people who were very frustrated by you know oh this is going to do this

42:45

work it and when you come in with something that says hey it really works how do you win them over oh um I think it's a never ending. I think trust is earned every single day. Trust is earned. I mean, there's like the initial trust of like, we've been talking about this vision for a long time and then we actually built the product and got it to work.

43:10

But there's also the trust that's built through, you know, working with hospital systems is kind of like a high touch, like white glove enterprise. And I think. through continued delivery on everything from our product commitments, our data security commitments, the kind of service we give people.

43:32

the continued fulfillment of of every kind of promise and like continue to expand and serve not just you know initially maybe more primary care ambulatory now emergency inpatient nursing stakeholders i think this kind of accrues through this continued delivery of everything from like the product through the experience of the medical teams that are working with us in partnership over the course of now we're talking about

44:01

you know, many years. Awesome. Well, thank you very much. I appreciate it, Zach. And it's a bridge. People want to find out more. It sounds like a very exciting space and see where you guys are headed. Yeah. Thanks for having me. Enjoy the rest of Dev Day. See you.

44:14

It's interesting to see where you have companies that are dealing with very high-stakes stuff like medical or education, and a lot of it is the trust building. It's not a thing where you just pop out with a product and you say, hey, we're ready, we're done. You can see from the examples here. between Danny and Zach and how they have to basically.

44:32

Caleb of basically just trying to sort of show the customers one step at a time, iterate on the product and improve it. And here to talk to us about tools for helping work on this is Lee Robinson from Cursor. How are you doing? I'm doing well. Thanks for having me.

⁠¶ Lee Robinson (Cursor)

44:44

Lee, it is great to have you here. So, cursor. I probably have about three cursor windows open right now on my computer that I'm thinking about right now. It has been an incredible product evolution for you all here. Just a little backstory. I was at OpenAI when we worked on the earliest version of codex and code completions. And I kind of naively thought that, oh, well, I just asked, you know, GPT-3 at the 3.5 at this time or the codex model or DaVinci code, whatever, and say just...

45:13

complete it and it's done. I got my code. I'm like, that's it. Code is solved with AI and we're done. Yep. Turned out that's not the case. Yeah, it turns out there's a lot that goes into it, especially from maybe simple text autocomplete, but to where we're at now with fully autonomous coding agents who can self-correct and fix their own errors and pull in information from the outside. world, and it's wild how much better the coding with AI space has gotten just in the past year, I would say.

45:40

And it's a tool very much. What I appreciate is the fact that you guys are using cursor to make cursor better. Definitely. Yeah. One part of our culture that. I think helps us produce a better product is not only of course, dog fooding, how we built the cursor by all of our engineering teams, all of our design teams, product management teams, using the product and using cursor agent to build the actual experience and for every part of their daily.

46:03

work, but then it allows us to give not only hard evals and you know feels on how a model works but just what it what it feels like every day to use the product you know sometimes the feedback is this just didn't feel right i don't know what it was with this model experience but we need to tweak this a little bit here and that is actually just as valuable

46:23

as an eval that shows that maybe we need to tweak this number one little bit. So the vibes are important too. How do things work internally as far as adding feature? I imagine when you're a company filled with developers, building a tool for developers, there's a lot of suggestions on what you should be doing. But how do you prioritize? Yeah, I think across the entire company, anyone has the ability to contribute features and we kind of measure.

46:46

if a feature gets product market fit internally so people make changes they push them to main and they see if it gets adoption they kind of push it out to company slack and say hey we built this feature um here's what it's good for we'd love for you to try it out we've even seen people incentivized with bagels and other pastries to get people to try out the feature and they get a lot of feedback internally on if it's good

47:09

should change. We also have weekly demo sessions where people show off the cool things that they're working on and some features don't work out. and we end up removing them. And others really take off. People have been wanting this for a while. We build it internally. All of a sudden, we look at metrics and we see how many internal devs are actually adopting the feature and what the churn rate is. And if it reaches a certain threshold, we have the conversation.

47:31

maybe we should actually release this externally so then we go through kind of a series of steps where it starts internal then maybe rolls out to some of our ambassadors rolls out to some of the people who opt into our nightly channel which is like you know releases all the time and then eventually there's the path towards getting it out to everybody.

47:47

So one of the challenges in AI now is evaluating models. And we can put something into one of the evals, one of the benches or whatever, but say, oh, it performs this well. But it's entirely different when you put it inside of an IDE and you have to basically put a harness around it. Right. So how much time do you all have to spend really evaluating these things to really figure out if this is going to be a good fit or not?

48:07

Yeah, quite a bit. I mean, one of the things that we love working with OpenAI on is getting access to integrate the model early, work with your team on tweaking the prompts, getting the harness updated as newer model comes out. I know we spent a lot of time on this with GPT-5.

48:22

we were able to delete some of our system prompt where as the models get better you don't have to be as precise with the instructions that you're giving it and the tools are getting better too so we we spend a lot of time not only dogfooding and just trying it for a bunch of tasks internally

48:36

but also... having many of the engineers internally all try to build new features on the core product experience with new models as they come out so then you get a good range of you know maybe smaller tasks but also real tricky engineering problems like real gnarly bugs in a very large code base Yeah, it's interesting that

48:53

A couple years ago, people sort of wondered, like, what was going to happen in the model space. Was everything going to be using big foundational models, whatever? But you guys have sort of an all of the above. You're using, you can have, I can use, if I want, you can choose a variety of different models in there. You can use codecs, whatever, inside of there. But also, you guys, like, trained.

49:08

your own models for the smaller for like tab completions? Yeah, it's been an interesting journey to watch the first versions of Cursor were really focused just on code auto-completion and it was predicting the next line or maybe the next action and now we kind of moved

49:22

from a world of using an off-the-shelf model to training our own model for autocomplete to now we do online reinforcement learning where we can actually roll updates every 30 minutes to the model, which is pretty amazing just based on the signals we get back from developers, whether they're accepting or...

49:36

rejecting changes that the autocomplete suggests. And that's been really helpful where you can have both these really focused and intentional models for very specific parts of tasks in coding. And then as you all make better models, these foundational models, we can integrate them and make that experience really great too. I'm seeing more and more of my friends who never even thought about coding start off with writing something maybe in Canvas inside of ChatGPT and then all of a sudden

50:05

and go, oh, how would I deploy this? And then they go try cursor and try to do this. And how much have you seen the demographic of the people using your tools change? Yeah, first and foremost. because we dog food cursor to build cursor like we're always building for the professional software engineer and we're trying to make that experience really great but as a side effect of making the product easier and more accessible for engineers it

50:27

kind of welcomes in this whole new generation of people who maybe were on the fringes of coding or had tried coding in the past and now it's it's much more accessible for them to do that so a lot more product managers a lot more designers our support team uses cursor quite

50:42

bit. And as more of these people have came in and started to use the product, it's allowed us to change actually some of the core features we're building. So one of the things that we're releasing right now is a new view of viewing the cursor experience where it's like a traditional code editor and it's more about working with the agents because for a lot of people who haven't coded before you open up an IDE

51:05

and you're like, what am I looking at? All this stuff. All this file tree on the left is very overwhelming for people who are just getting started versus opening up something that looks closer to a ChatGPT. You're like, okay, I know this experience. I've got my agents on the left. I can type into my...

51:19

and input text box. And so far, we've seen this really resonate with that type of persona where they're graduating into becoming a developer. I went from somebody who was... first copy pasting before we had chat GPT from the playground or whatever, building some simple little custom tools to then using IDE and like in case of the case using cursor and using tab completions and now really just sitting down and saying I just need to write a really good agents.md.

51:46

the early description. Have you seen those patterns shift? Yeah, I think as people get more advanced with working with coding agents, they really start to realize that high quality context in is going to give you much higher quality. responses from the models and we can build a lot of this into the core product itself but some of it requires just it depends on your use case of what you're trying to build the system requirements and a lot of this stuff gets put into an agents.md file

52:15

or gets put into newer features we have like planning so you can actually work through a plan mode go back and forth with the agent to do some research on what you're trying to build and it can search your code base and kind of help paint a picture and then when you're handing it off

52:30

to model to do coding, it's already got all of this amazing input context and the quality of the code generated is just significantly better at that point. So we've seen that be very helpful too. What would you advise for best practices?

52:42

for general just adopting Cursor? Starting there, yeah. Yeah, I think that... for people who are completely fresh to using cursor and we'll start with more of the professional engineer use case then we can talk about others but for professional engineers i think what we see for most people is their on-ramp is they start using the editor just like they would

53:00

traditional editors or IDEs where mostly they're using tab and they're doing their normal coding and they're still getting AI suggestions as they ride along and they're using agent to somewhat augment that experience, but they're not coding. maybe agent first, but I think slowly over time as the models get better and the tools get better.

53:18

they're graduating then into what would it look like if I can hand off more gnarly tasks to the agents to maybe run in the background or go refactor some part of the code base. So I usually recommend that progression for professional engineers. It's kind of the inverse. for those who are not professional engineers because

53:36

it's very hard for them to drop into code and then know, okay, this is JavaScript. Oh, actually this is TypeScript and this is what a type is. And this is what all these fancy words mean. It's much easier for them to take the other approach where they start with the agent view. They talk to the agent in natural light.

53:50

and the agent outputs code and then they can ask the agent what does that mean what is a const versus a let you know what do these things mean and you know or something in Python for example so generally that's what I recommend

54:03

Where do you think this is going to be headed? We're five years in to GPT-3. You know, the first time I mentioned earlier was that, you know, seeing somebody make a React button was exciting. Yeah. And here we are now where you have an entire code base going through. They're doing this. Where's it going to be five years from now?

54:17

there's still so much of software engineering not coding but software engineering the professional job that is a lot of mundane repetitive tasks that engineers are not excited by there's being on call and having to go through the fire hose of data to figure out how to solve an issue.

54:37

managing all of the incoming bugs that users are reporting and trying to figure out the right ones to work on and then actually getting them addressed. There's so much more that goes into how the actual software gets packaged and delivered and shipped than just

54:49

producing code and I think the next let's call it year to two years hopefully there will be a lot better solutions across the industry for making that part of software engineering easier like I imagine a world where you wake up in the morning and you're able to review

55:04

code that's already been tested generated you had customers report issues and they actually got fixed overnight and you could just review the output yes this looks okay it passes all of our tests except in merge and a world where maybe on-call isn't as painful and maybe code reviews are actually fun to use you know i think we can get there faster than what people might expect and

55:26

I think both in exploring things, just working with models directly, and then the products around them getting better. What surprised you the most? What has been your big aha moment? In working with Cursor? Cursor, anything. Yeah, I think for me, it is incredible. how when you're in the San Francisco tech world, it seems like everyone's coding with AI, right? Like everyone's using these tools. And then you talk to people around the world and you realize we're still so early.

55:52

We are so early. I mean, five years in from that first React component generated. And for a lot of people, this is still, you know, this year is the first time they're really coding with AI. So for the long tail of the industry and then the next.

56:05

however many million of people who become developers it's never been a better time to actually start learning how to code in general but then also how to use ai to write code and review code i was at a college campus bay area won't name the campus and this was last semester and I asked the students in the CS program

56:24

What are they teaching you about Agenda coding? What are they teaching about AI code completion? I'm sure nothing. Nothing, nothing. Not even a one-day class on it. And it was weird because, like, I get learn the fundamentals if you understand, you know, a deeper understanding of...

56:38

Python, these other languages, you're going to really be excellent at this stuff. Absolutely. But there was none of that. Yeah. The education system is going to have to change a bit specifically around coding because it's evolving so fast. It's being speed run.

56:53

And you 100% need to learn the foundations of CS. That's how you're going to have a successful career as a software engineer. But you need to know the new tools that are changing. At this point, every few months, they're just getting better and better and better. So this is one of the things I work on full-time at Cursors.

57:07

teaching developers and putting out educational material. So we just put out a course teaching devs and hopefully new devs a lot of these fundamentals. What is context? What are tokens? What's a context window? Like for a lot of people, this is gibberish. They don't know what this means yet.

57:22

Yeah, it still took me as an aside, though, because it was just thinking about students coming out that are going to want to go work in companies, and the companies that are moving really fast are the ones where you understand that. And my friends who... were brilliant computer scientists who understand it really deeply who are using these things get the best

57:41

of both worlds they understand deeply how the stuff works but they also understand how to apply this yeah so you know whatever we can do to encourage more of that i think in education would be great well just like how chat gpd has the learn mode built in i would love this is a side project maybe, but I really want to build this into cursor. Like I want.

57:59

engineers, especially those just getting started, maybe a freshman in college, to be able to use the tool and learn as they're building. Because for me personally, I think for most people who write code, applied application use of coding is actually how I learned the best.

58:13

can't just read all of the CS textbooks. I actually need to build something with the data structures or algorithms. It's like, oh, then it clicks. So if we can build more of that into the core product experience, I think that would be really helpful. Yeah. Two of my close friends who were never coders.

58:28

their first introduction after playing our own ChatGPT was to go into Cursor and to learn inside of there and also understand that, you know, once you're even using it as is right now, having something they can explain to you what a server does, how to run it locally and do that is an amazing lift. and it seems like that would be kind of a really great addition to it. Do you feel like we're going to see kind of the vibe coding era sort of just evolve into where we just call it coding?

58:54

Yeah, probably. I think one of the things with vibe coding that's interesting is certain words just catch fire because they put a phrase to something people have been trying to explain for a while. And to me, vibe coding is this idea of it's never been easier to create protocols. You can just try out ideas.

59:12

put it into ChatGPT and get a canvas and like see what it looks like or open up cursor and ask it to build an idea. And that doesn't necessarily mean that you have to ship that code because there is a long tail of software engineering where it's kind of like the iceberg. iceberg memes where it's like on the top it's vibe coding and then you're like oh wait there's actually a lot more to building delivering complicated software like we saw today with agent kit like

59:35

there's a lot that goes into building successful, reliable, observable agents. So I love that it's kind of opening up the top of the funnel and making software creation accessible to more people. And then as they go down, they realize, okay,

59:49

There's more complexity here. And I think over time that just evolves into this new world of software that will probably look quite a bit different than the past five years. Yeah, as a writer, we use term seat of the pantser was somebody who would just sit down in front of the typewriter and just go and tell it.

01:00:03

Maybe find a story. Or if they're a famous horror author, have to kill everybody off halfway into it and then come to an ending. Or, you know, planners, you know. And I think that sometimes you see the panting is a great way to explore and find out new things. So I never would have tried that and did it. And then when you sit down and say, no, I know.

01:00:18

and I know what to do, I'm going to go proceed this way. This has been awesome. Thank you so much for joining us, Lee, and people can find, you know, Cursor everywhere. It's pretty awesome. I highly recommend people play with this, and it's been... Just like I said, it's like my favorite thing to do is just open up some tabs and just throw some crazy thing in and see if I can make it. Yeah. Yeah. Thank you. Have a great Dev Day. Yeah. Thank you, everybody, for joining us.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Episode 7 - Live from DevDay

Summary

Episode description

Transcript

⁠¶ Intro / Opening

⁠¶ Caleb Hicks (SchoolAI)

⁠¶ Dani Grant (Jam.dev)

⁠¶ Zach Lipton (Abridge)

⁠¶ Lee Robinson (Cursor)

Episode 7 - Live from DevDay

Summary ✨

Episode description

Transcript ✨

⁠¶ Intro / Opening

⁠¶ Caleb Hicks (SchoolAI)

⁠¶ Dani Grant (Jam.dev)

⁠¶ Zach Lipton (Abridge)

⁠¶ Lee Robinson (Cursor)

Summary

Transcript