The startup behind ChatGPT voice - Russ d'Sa from LiveKit | Scaling DevTools podcast

⁠¶ Introduction to LiveKit and Lightning

Jack

00:00

Today's guest, Russ, is the founder of LiveKit. LiveKit powers the live audio for ChatGPT. To demo LiveKit, we learn how lightning works.

Russ

00:08

And that's why lightning often strikes tall objects. Make sense?

Jack

00:13

Can you explain it to someone who's more of an idiot? How LiveKit turned down a 20,000,000 acquisition offer.

Russ

00:20

I left that meeting, honestly, like a bit terrified. I mean, this is, like, existential. Like, these guys are gonna destroy me.

Jack

00:27

And how they started working with OpenAI.

Russ

00:30

I get this email, and I was like,

Jack

00:32

what? We talk about how the heck you prepare to go live with a company like OpenAI.

Russ

00:38

And then I got a message from the team, and they're like, we're ramping ChatGPT voice mode to all free users in fifteen minutes. How do you feel about that?

Jack

00:47

Russ is a bloody legend. Enjoy the episode. I feel very passionate about, like, the voice parts because it's just such an amazing way to especially learn things. It's just I know. Incredible to just chat with it and okay. And you can ask so many dumb questions.

Russ

01:05

Yeah. It's and you just do automatically. Right? It's like you don't even think about it. You just, oh, yeah. Tell me this. I mean, you just once you know that it's not a person who has a capacity to judge you and make you feel bad about yourself, it's just it's freeing. You can just ask it whatever.

Jack

01:23

What what have you have you had any, like, really fun ones that you've you've gone deep on with ChatGPT?

Russ

01:29

I've gone deep on a few. I learned about how, like, the sun works. I I was chatting with it I've I've chatted with it a bunch about religion too, but we probably shouldn't talk about that one.

Jack

01:45

Maybe it's not related, is it? I don't know. No.

Russ

01:48

I I chatted about I learned about a lot about Lightning. Lightning is super crazy. I didn't really understand

Jack

01:56

How does Lightning work?

Russ

01:57

I think there was this one there was this one part about Lightning that I didn't understand, and that was so you have this cloud, and inside this cloud is kind of like this crazy water ice something storm that's happening inside. And, like, there's this charge being built up by, like, particles, I think, hitting each other and, like, electrons being knocked off. But what's interesting is that, like, there's a negative charge that is building at the bottom of the cloud. And you're like, well, I kinda don't get it then. Like, if there's a negative charge that is being built up at the bottom of the cloud, why does why is there, like, this spark or this, you know, flow of energy from the negative bottom of the cloud to the ground.

⁠¶ Understanding Lightning: A Deep Dive

02:45

Right? Like, I I guess I always thought of grounding as, like, that the that the ground itself was was negative or something, and so I that there was this flow from positive to negative. But the bottom of the cloud ends up getting negatively charged, And what it does, that negative buildup of electrons at the bottom of the cloud, it pushes the because likes repel. Right? So it pushes the electrons and negative particles at the actual ground on, you know, Earth.

03:15

It pushes those farther into the Earth. And so the bottom of the ground ends up getting so the negative and the negative repel, so it pushes the the electrons at the Earth into the ground deeper, and thus, the surface of the earth becomes more positively charged. And then when that builds up enough, then there's this lightning bolt that is the flow of electrons from the negative, at the bottom of the cloud to the to the positive charge at the surface of the Earth. It's I think that's right. I don't know.

03:47

You're gonna have to check.

Jack

03:48

That's wait.

Russ

03:49

Let me

Jack

03:49

pull it off because I feel like

Russ

03:51

You're gonna have to check me on it, but I think that's right.

Jack

03:55

I'm not I'm I'm less checking you. It's more just showing off that you built this cool thing. Okay. Pretend I'm an idiot, which is not hard. Explain to me how lightning works particularly the negative to positive flows and how clouds develop this kind of static stuff as I understand.

Russ

04:21

Sure. Lightning happens because of the separation of charges in a thunderstorm cloud. Inside the cloud, you've got ice particles colliding. This creates a charge separation with the top of the cloud becoming positively charged and the bottom becoming negatively charged. When this charge difference becomes large enough, it creates a giant spark, which is lightning.

04:43

The negative charge at the bottom of the cloud seeks out a positive charge, usually on the ground, and that's why lightning often strikes tall objects. Make sense?

Jack

04:53

Can you explain it to someone who's more of an idiot?

Russ

05:00

Sure. Okay. Imagine a cloud is like a big battery. Inside it, tiny ice bits bump around making one part of the cloud negative and another part positive. When these charges get big enough, zap. A lightning bolt jumps from the cloud to the ground to balance things out. That zap is what we see as lightning.

Jack

05:20

Okay. Wait. So it kinda leaves the charge leaves because there's too much difference?

Russ

05:30

Exactly. When there's a big difference in charge between the cloud and the ground, the cloud wants to balance it out. So the charge moves from the cloud to the ground in a big flash. Ask her what happens to the negative particles, in the ground. What happens

Jack

05:45

to the negative particles in the ground?

Russ

05:48

Great question. When the negative charge from the cloud meets the positive charge on the ground, they neutralize each other. This discharge evens out the charge difference, and that's what we see as lightning. No. But what about the existing negative particles before the lightning bolt strikes?

⁠¶ How LiveKit Powers Real-Time Communication

Jack

06:05

Wait. What about the existing negative particles before the lightning bolt strikes?

Russ

06:15

So before lightning strikes, the ground isn't full of negative particles, it's mostly neutral. But as the cloud builds up a negative charge, it repels electrons on the ground, making the surface positively charged.

Jack

06:27

Oh, wow. The lightning Yes. Wow. Wait. So how does it how is the that's quite far away. Like, how is it able to repel before the lightning hits?

Russ

06:39

So the cloud's negative charge creates an electric field that extends Woah. Way down to the ground. This field pushes away electrons in the ground, leaving the surface positively charged.

Jack

06:51

That's unbelievable.

Russ

06:52

Cool. I

Jack

06:53

got it right. Thank you. Well, okay. Well done, Ross.

Russ

06:57

Yeah. Thank you. Yeah. Well, I'll then chat with you. You taught me all

Jack

07:00

about Okay. And so maybe we should have done the context before, but I could edit this back end.

Russ

07:06

Yeah. Yeah. You can.

Jack

07:07

We were using LiveKit there. Right? Technology.

Russ

07:12

You were. Yes. So, you know, when you tapped on that that button to enter into advanced voice mode, That's there's an agent that was built using LiveKit's framework on the back end. It's connecting to LiveKit's cloud network, and then your phone is also connecting to LiveKit's cloud network. And then when you talk to the model, all of your audio data or voice data traveling over our network, arriving at the agent on the back end.

07:39

That agent is then relaying that audio into g p t four o. It's thinking about what you're saying. And then when the response is generated, that's going back through our agent framework, back through our cloud network to your phone, and then played out. You can kinda think of it sort of like a sort of like Cloudflare for for multimodal AI or or media streaming.

Jack

08:03

So and so from OpenAI's perspective, are they dealing with the audio, or are they, like is it, like, they're just getting kind of a transcript?

Russ

08:14

Yeah. So what's happening there is that I wonder if you could ask Advanced Voice what's happening. Anyway, no. But let's not do that. Rob probably won't know. I don't think it's gonna know.

Jack

08:27

Can you explain how LiveKit works?

Russ

08:31

Exactly. It's like a big state

Jack

08:33

Wait. We're on the previous question. No. Wait. Wait. Forget Lightning. Can you explain how LiveKit works?

Russ

08:42

Sure. LiveKit is an open source WebRTC platform for building real time audio and video applications. It handles all the tricky bits like low latency streaming, media synchronization, and scalability. Basically, it provides the infrastructure so developers can focus on building features, not dealing with the complexities of real time communication.

⁠¶ The Role of OpenAI and LiveKit Collaboration

Jack

09:04

But wait, if OpenAI using LiveKit, do they deal with transcriptions or they just get audio themselves and then they transcribe that, handle that?

Russ

09:16

So if OpenAI uses LiveKit, they can handle the audio streaming and then send the audio data to their servers for transcription. LiveKid would manage the real time communication, while OpenAI would focus on processing and transcribing the audio.

Jack

09:32

Ross, are you are you happy with that?

Russ

09:34

I'm I'm happy with that. That's definitely accurate. Yeah. We effectively allow OpenAI to run compute wherever they want. Right?

09:44

And then we have servers all around the world that kind of give a user the lowest latency connection to the fiber backbone of the Internet, carry their audio data to wherever OpenAI has compute, and then they take that audio data or that video data. So if you're sharing your screen or pointing your camera at something, so they take that audio visual data, and they can send it into their model, and then the model generates a response. So the model can generate transcriptions. It can generate an audio response in the case of a multimodal model like GPT four o. And so you kinda think of it as, like, GPT is this brain that's answering all your questions and can understand the nuances of your speech and things like that.

10:30

And then LiveKit is the nervous system that carries the information to that brain and from that brain.

Jack

10:37

Yeah. And I guess it's really hard because it's big files, you've got, like, this constant two way. Is that what the big challenge?

Russ

10:46

You can kinda think of it like that. I mean, the one key bit is that it's not a file. So, like, the if it was a file, the impact of that is the is that's like an intermediary storage mechanism, which means it's kind of like a buffer. And so when you're talking about kinda real time interaction, right, like we're talking right now, there's no file intermediary between my audio coming to you and your audio coming to me and video as well. It's kind of like a point to point streaming.

11:20

It's not true point to point if you look at, like, network topology, but it's there's no intermediary storage happening. It's just going my bytes are going straight through to to your computer, and and your bytes are coming straight through to me. In the AI use case, it's the bytes are flowing straight through to an agent that that's connected to you, in a similar way that we're connected, and then the agent is taking those bytes and processing them like your brain is doing right now.

⁠¶ Challenges of Real-Time Streaming

Jack

11:48

This episode is brought to you by WorkOS. If you're building a dev tool, at some point your customers are gonna start asking you for enterprise features. Work OS offers you single sign on, skin provisioning, and audit logs out the box. Work OS is trusted by Perplexity and Vercel, as well as Work Brew, a homebrew management startup that I recently interviewed. I just told Mike that WorkOS is the sponsor and this is what Mike said.

Mike McQuaid

12:15

Yeah. So WorkOS isn't paying me any money for this. I I pay WorkOS money for this, but WorkOS is like one of the best developer tools I've, like, ever used. It's it's the documentation and the experience with building with them is so, so good. Like, I initially was almost like, okay.

12:31

This seems expensive, but then I built an integration with them in about twenty minutes that I had spent two days banging my head off the wall trying to build it directly with Okta. And then with Workhorse, I then have, like, many, many SSO providers, like, support instead of just one. So, yeah, like, for me, Workhorse is one of the nicest developer experiences I've encountered in the last like five years probably. And it's not surprising because a bunch of developer team are ex GitHub

Jack

12:56

and therefore very good at the job. Go to workos.com to learn more. Yeah. Okay. That's that that makes sense. And I I've not done much stuff with, like, streaming, but I know it gets I've had this extremely tricky.

Russ

13:15

It's tricky. It's it's very tricky. It's pretty complicated, but also not it it up until recently, it it was not used that much on the Internet. I mean, Zoom does it. Discord does it.

13:32

Netflix does a flavor of it, but it's not real time. It's you know, you when you watch Netflix, it just buffers. You sit there, and, like, sometimes if your Internet's weak, it'll buffer for thirty seconds. What what's happening there is there is a file intermediary in the Netflix case, and it's actually going and downloading a part of the file. And then it starts to play the video.

13:52

And as you're playing the video, it's downloading more of the file and sticking that onto the data onto the back. And, eventually, you play through, you know, the the entire movie or TV show that you're watching with Netflix, but real time is a bit different. There is no storage or file that you're downloading. All the data is generated dynamically from my camera and my it's captured from my camera and my microphone in real time and then sent over a network to you and then reconstituted and played out through your screen or through your speakers on the other side. And so I think not a common thing that people have had to do, but now with this multimodal AI and the computer getting really smart and us building, like, HAL 9,000 or whatever, it's I think streaming is gonna become a much more common way that you interact with computers.

14:41

So it's gonna be in more places.

Jack

14:43

Yeah. And and going back to the OpenAI part Mhmm. Could you maybe share a little bit like what what like your world with LiveKit was like before? I I mean, also the question is how how they started using you, what those conversations look like. But did that change your what you were building at all?

⁠¶ The Evolution of AI Interaction

Russ

15:08

Yeah. It definitely did. It it it is certainly a time, like, in the history of the company that it's very clear a few things changed, including my stress level. But, you know, it's Good

Jack

15:22

stress, though.

Russ

15:23

Yeah. Good stress for sure. Grateful for for the stress, definitely. It does hurt, though. You know, it's like exercise. It's like it's rewarding when you look back on it, but it definitely sucks in

Jack

15:35

the moment.

Russ

15:36

Yeah. Yeah. Yeah. And so I remember it's a it's a funny story. I don't think I've told this story. Maybe I have. I'm not sure. I don't think I've told it yet.

Jack

15:48

Definitely an exclusive. Well, we'll go

Russ

15:49

about an exclusive. Yeah. Yeah. Yeah. We'll we'll have two exclusives on this on this show.

15:55

One is actually answering questions with ChatGPT live, and then the other one will be the story, but I'll I'll censor it a little bit. So there was a really large company. I remember so I think it was, like, 08/04/2023 or 08/03/2023, '1 of those two days. I think it was the fourth. And I went to this lunch for five hours with this really large company, and they kind of ran through this gamut of, like, options for me.

16:29

They were like, can we buy you? Can we license your cloud product? Can we contract you to build the same thing at our company? Or can we maybe wrap your stuff and launch our own version of it to compete with you? Because we're an open source project. Right? All of our stuff is open source. And so I was kinda taken aback, and I and I asked. I said, well, you know, if you were gonna buy us, like, how much would you buy us for? I'm just curious.

17:04

And the economy was bad, and, like, they effectively, in the end, they didn't wanna tell me because they thought it would be an insulting offer, but I pushed. I insisted anyway. And they said, well, we'll buy you for we can buy you for $20,000,000. And I was like, well Wow. I mean, that's awesome.

17:26

And, you know, thank you for that, but we've also raised 15, so it's it's not great. You know, just mathematically, it's not gonna be great. And, they're like, well, look. If you can tie what you're doing at LiveKit to GenAI, two hundred million. And I was like, one, that's kinda ridiculous you said that, but two, I I no.

⁠¶ The Future of Human-Machine Interaction

17:57

Like, we're like a video conferencing, live streaming infrastructure company. Like, I can't I can't tie it to GenAI. Like, I don't have I don't have I can't I can't position it in that way. It doesn't make any sense. So I left that I left that meeting, and I was, honestly, like, a bit terrified because they're a big company.

18:20

They're very respect like, respectable. Like, I admire a lot of the stuff that they do. And and so I was like, wow. I mean, this is, like, existential. Like, these guys are gonna destroy me. Like, it's time companies is gonna be dead. And so I'm driving home on the 880. It's terrible highway. Never take it if you can if you can manage that at rush hour. And then I get this email, and it's like, hello from OpenAI.

18:54

And I was like, what? And so I chat with them, and we start to talk. And they wanted to build a voice interface to ChatGPT and had been exploring options. And they'd signed up for for LiveKit Cloud a few weeks before, and I didn't know because they used a personal Gmail account. Wow.

19:18

But they ended up, you know, building kind of the first proof of concept for for VoiceMode on LiveKit Cloud and got excited and convinced that it was a solid platform from a small team, and it was it was good software. And I guess, you know, definitely being open source and then being able to see our code helped convince them of that. But we started to have commercial discussions, and, you know, ever since, we've been working with them on all of the multimodal real time stuff. And so that's an amazing story for sure. And what's interesting is after I talk to them, I'm like, wait a sec.

20:00

I do have a Gen AI story. Hey.

Jack

20:04

Where's my 200 mil?

Russ

20:05

Yeah. Where's my 200 mil? I'm ready to sell. No.

Jack

20:10

It's 400 now.

Russ

20:12

Yeah. And then, you know, that just changed that just changed the trajectory of the company. I mean, obviously, I became a lot more stressed because I have so much respect for what OpenAI is doing, and I wanted to not let them down. You know? Because I I told them, hey. Like, we're we're gonna make this work, we can scale and all of that. And that's another story too, which is kinda funny. But I'm not sure I'll tell that one on this episode. Maybe we'll save it for the next one. But but yeah.

20:36

Like, I I think that in that moment when we started to work with OpenAI on voice mode and when I tried the first prototype of it, you know, I guess it was well yeah. I I tried the first prototype. It it's just it's just funny because they had, you know, Skye's voice in there. And I tried it, and I was like, man, this does sound like ScarJo. But it it was not.

21:08

It is not ScarJo. It wasn't. And, like, I know that for sure because I was, you know, there in the early days of it. But I also was like, I like this voice. It really does sound soothing.

21:21

In any case, the realization and I've said this, you know, in other conversations before, but, like, the realization was, okay. OpenAI is gonna build AGI. AGI is a synthetic human being. Right? If you take it all the way to its end, maybe there's some people who don't believe that that's gonna happen, but let's just say, for at least my opinion, for talking purposes, AGI is this synthetic human being.

21:48

It's a computer that can behave like, communicate with you like a human being, teach you things like we just learned about lightning, all those kind of stuff. If you have a computer that's smart, how are you gonna interact with it? You're probably not going to type you're probably not gonna, like, use a keyboard and a mouse to to interact with that computer. Like, that's what you do when the computer doesn't, isn't very smart, can't understand you naturally, so you have to adapt yourself to give it information so it can work for you. Mhmm.

22:20

But when the computer is really smart and like a human, indistinguishable from a human maybe at, you know, at the end of all of this, gonna interact with it the way that you interact with other human beings. And the way you interact with other human beings is by through your eyes and your ears and your mouth. Like, you talk to other humans. You see them move. And the equivalent sensors for a computer, a human like computer, to eyes, ears, and a mouth are cameras, microphones, and a speaker.

22:51

And it just turns out that kind of what we were working on for video conferencing and live streaming can be used as the eyes, ears, and mouth of of multimodal AI or, like, a human like computer. It's the you know, we're basically taking the camera and the microphone and hooking those things up to the Internet and allowing the data that streams out of those to be transported anywhere else in the world. And and so that's what our infrastructure was designed to do, albeit for a very different use case, and now it's gonna be used for this use case. And I think that this use case, particular, is gonna be everywhere. The future, of course, will contain humans connecting to other humans in applications, but I think the vast majority of real time is going to be humans connecting to machines.

23:37

Whether those machines are personal devices, whether those machines are models running in the cloud, or whether those machines are humanoid robots walking around, you know, the the earth and trying to help us in the physical world, In all those cases, you're going to interact with that computer using your natural human IO. And and, yeah, we're like kind of like the wires of the nervous system in that in that brain analogy. The model's a brain and and we're the nervous system.

Jack

24:09

Yeah. I I wanna ask you about the about the scaling to I wanna I wanna I definitely wanna ask that. But just to kinda dig in because I remember a few years ago, I got kind of, like, hyped around Alexa and like, because I think after university, there was a point where I had like really bad like pains in my in my arms and like I didn't I got kind of excited about like, you know, like, you get tired after like, you get, like, this pain after sitting at laptop so long. Yeah. And I got really into, like, what you know, can you just be, really productive?

24:43

And I remember just being as I got into, like, looking into the Alexa ecosystem, it just got, like, it's kind of just demotivating. It felt like they'd kind of abandoned it. And it was just there there just wasn't much going on there. It was like the kind of things were just about novelty and, you know, I still use it to like set alarms and ask for the time and stuff like that. But like Yep.

25:07

What what do you think the future can could be with how we interact with with how we do our work, how we live with, you know, with GenAI stuff.

Russ

25:18

Yeah. I think that there's two there's two kind of stages to to how this progresses in my opinion. One thing I'll say is with the Alexa example you were talking about, right, you still use it to set alarms and stuff like that. I still use Siri for the same, but both of them have been pretty pretty bad historically. I sometimes tell people that, like, that's why I believe that in a lot of cases, hallucination is a feature, not a bug.

25:52

Because when you're when you were interacting with Alexa, like, ten years ago, right, probably when you had one, That's when I I think I got mine. And when you interact with that system, it's just this, like, giant expert system. It's like a ton of if else statements. Right? And so Yeah.

26:11

If you hit a branch in that tree that it can't handle, it says, like, sorry. Can't do that or whatever the response is. The penalty from a UX perspective of not being able to answer a query is so high that it erodes your trust in that system where then you just stop asking it pretty much anything. You only ask it the things that you know explicitly it can do, and you probably only know or are aware of, like, five things. It can definitely do more things, but you only know probably about five of those things, and that's the only thing you'll you'll use it for because you had some experience where you asked it for the sixth thing and it couldn't do it.

26:47

And so you're just like, forget this thing. It sucks.

⁠¶ The Evolution of Voice Interfaces

Jack

26:49

Yeah. And it always talks for ages when it doesn't when it can't do it.

Russ

26:53

That's right. Yeah. They just really drones on with the response, you know, really very profusely apologetic. But the when you talk to an LLM, right, if you even use advanced voice mode, I mean, there's almost nothing that it doesn't know something about now, but it just hallucinates an answer to you. Like, you always get some kind of response, you know, to your query, whether it's correct or not or whether it's exactly what you wanted.

27:20

You get something, and it's the something is not I don't know. And so you don't you kind of almost avoid hallucination helps you almost avoid this negative UX penalty that erodes your trust, and you'll kinda keep trying. Or maybe phrase your question or your prompt a little bit differently, etcetera. So I think that that's a really important feature. Where I think this is all going, right, with voice interfaces, and, you know, you you talked about how pain in your arms, like, maybe RSI or something from, like, typing on a keyboard a lot, etcetera.

27:54

You know, we have this vibe coding now. Like, you know, maybe it's gonna become like voice coding, but vibe coding sounds a lot better, so I don't think we'll change the term.

Jack

28:02

For now. I feel like it's getting pretty I feel like we need to do word already. It's just like Yeah. Inundated.

Russ

28:08

Yeah. Maybe maybe the entire just the word coding goes away. Right? I think that's what ends up happening. It's just vibe.

28:15

I'm vibing right now. But, you know, I think that the first stage of this, especially for creatives and people that are building stuff, is it's like the copilot use case. So what's interesting, I I I was at a dinner yesterday, and we were talking about this a lot because there were the people at the dinner were all kind of working in the cogen space, or a lot of them were working in the cogen space. And one of the parts of the conversation was that you have something like Cursor today where it's your copilot, and it's helping you write code. Right?

28:49

And then there's the vibe coding version of this where you're kind of just hitting accept whatever changes it has and and kind of correcting it or trying to reprompt it if it's if it's getting a a bit lost. But you are kind of like the editor in that experience where it's doing a lot of the mundane work, and then you're the editor hitting accept, accept, accept. But it's still driven by you. I think that that's the, that's the the key part of the relationship with the with the AI model today and the way that you write software, still driven by you. There is a flip that is happening, though, and I think, like, Replit agents is one example of this.

29:31

Maybe to a degree, v zero is also like this, but Devon is certainly like this, and Cloud Code is like this where you're still pair programming, but you're not the one driving. It's actually the AI that is driving. And so you're kinda sitting there, sitting back, watching, supervising, making sure that it's doing the right thing, you know, changing the right files, making the right decisions, testing it properly, all of that stuff. And so that, I think, is like it's still a copilot experience. You're still supervising and, you know, synchronously working with the AI or looking over its shoulder.

⁠¶ The Future of Coding: From Co-Pilot to Coworker

30:14

That's the first step, I think, to where things are going. And I I think, like, all this reinforcement learning kind of verification of the results is is gonna help very quickly accelerate cogen and, like, the AI's ability to to be the driver in that in that pair programming example or use case. Then after that comes, like, the agentic the full agentic experience, right, where it goes from copilot to coworker, where you say, hey. You know what? I wanna build this app, and here's kind of how I'm thinking about it and, like, go, you know, work on it.

30:49

Give me, like, a first draft of it or come back with, like, a proof of concept. And then it goes and it starts to do its stuff, and you're not really you're going off and using your time for other things, and then it comes back, and it's like, okay. I've got our, like, an initial proof of concept here that I've put together and all that stuff. And what do you think? Like, what should we tweak?

31:08

What should we refine? Let's have a meeting and sit down and and talk about it. And and just like you would with a human, you would do the same thing with with your AI coworker, and and then it'll go off and continue doing its work again and the refinement steps. So I think that that's that's, like, the direction that these things are going. And I think a lot of interfaces, for creatives especially, it's gonna be less important to be an engineer, and it I think more product designers are going to or rather product design is going to become the skill, to hone for a human.

31:42

Kind of, you know, I guess, like, a very clear example of this is if you watch, like, the Iron Man movies, it's it's the way that he, you know, even just designs a suit in the first place, yeah, using Jarvis to do most of the mechanical work. Yeah. I thought you were waving your hand in front of the camera.

Jack

31:59

Oh, no. I'm doing I was doing the Tony Stark. Does does do the kind of, like, stuff of his hand as well. Right? Like the

Russ

32:07

He does. Well, crazy story is that one of my teammates was interviewing an engineer for an engineering candidate, and they weren't blinking. And, like, their face was very kinda static. And so she said, hey. Do you mind, like, moving your hand in front of your your face really quick? Just, like, wave your hand in front of your face, and the person just disconnected. It, like, just disappeared. What? So

Jack

32:35

what were they doing?

Russ

32:36

They were an avatar. A generated three d av like a live video avatar.

Jack

32:40

Woah. I kinda got like a shiver when you said that. I felt like it's super creepy. You think you're talking to a human and you're talking to an avatar.

Russ

32:49

It's really weird.

Jack

32:50

Yeah. Wait. Anyway Can we both should we both just do it? Just Yeah.

Russ

32:54

Yeah. Are you asking me to like that answer was so good that you're like asking me to like wave my

Jack

33:00

Okay. Wait. You did do it. I didn't have to ask you, but I I've just verified. Russ has done it as well.

Russ

33:04

I am real. I am here. I'm a real human being. Okay.

Jack

33:09

Yeah. That's that's that's that makes sense. And I guess it's like, you know, when you're collaborating, you either do it in Slack reviewing someone's poor request, that sort of stuff or probably more is just like standing up standing behind someone and looking at

Russ

33:26

Yeah. The screen. Right? I think you're gonna have those those processes are probably still going to exist as part of, like, the software development life cycle, but your it's just gonna be an AI that's doing a bunch of, you know, some of the components of it or parts of it. I don't we're certainly not gonna do all of it.

33:48

I think it still requires, like, human oversight. And, ultimately, like, the things that we are building, we're building for other human beings. We're building for each other. So there's gonna be a human in the loop somewhere.

⁠¶ Navigating the Chaos of Scaling Technology

Jack

34:01

Yeah. Yeah. Yeah. I hope so. I hope, we still we still get involved. It seems like it's gonna be fun.

Russ

34:08

Have you have you been vibrating at all or no?

Jack

34:11

Oh, yeah. I mean, yesterday, I literally I think I had to try it. I was generate I've never used Grafana before and I was like using it to generate like a massive massive dashboard and it was like literally running out of like context doing it. It was just like Oh, wow. And I understood absolutely zero but I was just like, okay. Well, if I can see what it's doing and then I can test that it seems to do what I need, that's that's perfect.

Russ

34:38

Yeah. And

Jack

34:38

it's actually a better experience because I don't I've never really enjoyed creating dashboards and stuff like whatever it was. You just like clicking around and so if I can just say, here's what I want. Just generate the code. It's just a much better experience than

Russ

34:54

Yeah. Agreed.

Jack

34:55

Yeah. Yeah. I wanna hear about the about what it was like because I I mean, I remember like seeing the first thing the first time I saw it was I was actually at the AI engineer summit and I think Roman from OpenAI like demoed the interruption, which I'd never seen. I remember that being like a crazy moment.

Russ

35:13

Oh, yeah.

Jack

35:14

And then I also remember when it drops completely with the Scarlett Johansson stuff, but like it that must be, like, nightmare levels of, like, infrastructure that you need to just, like, make sure it's working. Was it was it just, like

Russ

35:30

You know, I was really proud of the team. So we definitely knew I mean, OpenAI scale is insane. So, like, we we definitely were like, okay. We need to go back, look at our system, make sure that we are architected properly to handle, like, the scale. And so there were we have a there's a couple of folks on the team that are just insane distributed systems engineers. I don't know how we convince them to work with us, but, like, they're just really next level.

Jack

36:11

That's awesome.

Russ

36:12

You definitely can vibe code the stuff that they do. And and so, I mean, I don't even know how you cook will come up with the prompt for the the initial prompt.

Jack

36:24

But so are gonna be safe for a while.

Russ

36:25

Yeah. We we we definitely made some refinements and some changes ahead of time kind of anticipating what that scale would be. We also power voice for character AI, and they kind of started to also deploy around the same time as as voice mode. And so a little bit after, but around the same time. And so Character AI, I mean, has, like, insane scale as well. Most people don't I don't know if how many people know that, but their their scale is wild.

Jack

37:04

Avatars of like different anime characters and stuff. Right?

Russ

37:08

Yeah. Yeah. Exactly. Like AI companions and stuff like that. And so and so like both of them kind of ramping up at the same time was pretty pretty pretty nuts.

37:20

And I was I definitely lost a lot of nights of sleep through that process. But I think what I was so proud of the team about was that, like, when we first launched Voice Mode, it was for all it was for all paid users. And it was like there was some other limiting factor to it as well. I can't remember what the limiter was, but, you know, it wasn't it wasn't to every ChatGPT user. And so that was comforting to begin with.

37:52

But I remember that I remember that the whole thing with Sam happened, and the

Jack

38:01

That was around the same time. Right?

Russ

38:03

It was around the same time. And then the day that he came back, I guess earlier that day, the team was like, you know what? We're gonna I I was in an interview with a with a candidate, and then I got a message from the team, and they're like, Russ, we're ramping ChatGPT voice mode to all free users in fifteen minutes. How do you feel about that? And I was like, okay.

38:39

It had been running smoothly, and we've been doing well. So and, you know you know, it's not like OpenAI was just trying to, like, rug pull me or anything like that. It was just they were just like, hey. Like, you said it scales, Russ. You told us it scales, so, like, we're gonna we're gonna ramp it up.

38:57

We're ready to go. And so my cofounder was in the, like, Cayman Islands on vacation, and I already told the team, like, no shipping anything this week. And and so this message came in the middle of that, his his one one week vacation in the Cayman Islands. And so I, like, send this message. I I pause my interview.

39:21

I'm like, hey. Can I can I take a five second five minute pause right now? I gotta, like, ping the team. So I ping the team internally on Slack, and, like, they look at the stuff. Like, they look at the graphs, the Grafana charts, and they're like, yeah. We're good. We're ready to go. And so I'm like, are you sure? And they're like, yeah. Yeah. We're ready to go. So they ratchet it up, and it just it just scales. It's like no problems at all. Wow. And I was like, just team's amazing.

39:53

I have not I had nothing to do with it. But, yeah, team is incredible. Wow.

⁠¶ Lessons from the Frontlines of Tech Development

Jack

39:59

If you had to, like, kind of if there were any other lessons of, someone that was about to kind of onboard, a customer that's, like, way bigger than they'd ever worked with. Like kind of any things that any like lessons or like things that you would advise from your experience doing that.

Russ

40:21

I think my lesson is probably counterintuitive. My lesson is embrace the chaos. Here's why. So I remember I was at Twitter, you know, in the early days. I was the second front end engineer, second client engineer at Twitter, and that team, I think, became over a hundred something eventually over the course of its lifetime.

40:45

And Twitter was just just terrible. Like, the the code, the architecture, everything was just so bad. It was, like, melting down constantly, like the Fail Whale thing just constantly happening. And a lot of people ask, like, well, why did you write it this way? Why didn't you plan for this scale?

41:06

And I think, like, I think there's this tendency for engineers to try to imagine and architect this, like, perfect system, but hitting a scaling wall or having something that you designed break, is actually like a rite of passage. I think it takes way longer. It delays your go to market, and finding product market fit so much more to try to devise a system or an architecture that would scale, and you don't even know if you need to. Like, Twitter, literally, I'm not even talking about just from a software perspective, but I'm very grateful grateful to have worked there. Like, it's an amazing company, probably the best time in my entire career, including I mean, I am enjoying LiveKit as well, but it was such a fun time.

41:50

I was an IC engineer. I was young.

Jack

41:51

Iconic company.

Russ

41:52

Such an iconic company. You know? Like, I it's just so rare to be able to be part of that, especially in the early days. So it's not like a diss on anybody, but that company literally screwed up every possible thing that there is to screw up except they just nailed the essence or the kernel of what that product's value proposition was going to be, and they just had insane product market fit. It was so locked in that you could just afford to screw up everything.

42:23

But what it taught me was that, I mean, I guess it taught me exactly that is get the product market fit right. That's the most important thing. You can go down. You can fail. You can design the wrong system.

42:34

You're gonna rewrite it. Like, that's all a rite of passage that proves that you have found some semblance of product market fit or you've built something that people want. Mhmm. And so now you have the honor or the privilege to be able to rewrite it. And so I think that my advice would be the learning I mean, yes, our system scaled and we prepped for it.

Jack

42:54

That was gonna be my question.

Russ

42:56

But but I guess the point is that we didn't prep for it a year in advance. Yeah. We kind of we did a review. We looked at where the bottlenecks would be, and, you know, the part of the story that I didn't tell you is that, like, yes, the OpenAI's chat you know, voice mode scaled, but then Character AI came in, and we did have an outage because there was a a component of there was a one component that, like, we hadn't hit the wall of it when OpenAI started to scale, but then we hit the wall of it when CharacterAI started to scale. And we had to rewrite the component, and we had to, you know, re rethink about how we were doing things, and we did it very fast.

43:38

Or, you know, my teammates did it very fast, so it's full credit to them. But but that was a part of the journey too. Like, I was proud that we scaled to the, you know, the certain level, but then we did hit a wall, and we had to, you know, we had to do stuff differently to get to the next level. And that's just part of it's part of growth. It's part of, you know, building things.

Jack

44:01

Yeah. Hard Yeah.

Russ

44:02

You have these unknown unknowns that, like, you they only come about when, you know, you get the real usage. There's another example I'll use here, which I think is a a nice kind of met not a a metaphor, but like a a an anecdote is, I read Elon's biography, and in it, they have a chapter about SpaceX. And they talk about how when they were starting SpaceX, what he told the team was, like, go to the desert out in Texas and just explode rockets as quickly as you can. And you're like, what? And the reason why is to learn about what makes rockets explode so that you can build stuff to prevent rockets from exploding.

44:49

But the whole purpose is to uncover the un to make the unknown unknowns known known I don't know. Known unknowns? I I'm that I don't know how to word that one, but it's to understand what makes Xerox explode so you can move as quickly as possible to put the patches in or to to build something that prevents all of these these negative consequences or adversarial effects from happening. And and that's I think that that's, like, a model for how I think about things too is go out there and, like, figure out what the requirements are as fast as you can, right, so that you can solve those problems. There's almost no way to envision all of the different failure modes, in what you are building, especially in, a complex system, which LiveKit is.

45:36

And so, yeah, I think we I think we did it right.

Jack

45:40

Yeah. That that is really cool advice. Yeah. Yeah. I still the quest question I still have is just like around like the infrastructure, like if the rules are different for because I kind of always felt like, I don't know, if yeah.

45:55

That makes sense for Twitter, but then does it make sense for, like, life care or, like, you know, because because you're, like, kind of selling, like, kind of almost, like, you used to write, like, the kernel. It's, like, almost, like, your kernel is, like, we don't wanna worry about this going down. Like

Russ

46:09

Yeah. You're totally right. So the the there there there is, like it's not a total fair kind of line to draw between Twitter and and LiveKit for sure because Twitter, at the end of the day, is an application. And so the users that they sell to are consumers, right, that are using the product. And, so they can go down, and I think users will be a little bit more forgiving and, and all of that.

46:45

But infrastructure kind of has this expectation that it's behind the scenes, and there are people that are building on top of this infrastructure. And while the the developers building on top of your infrastructure might make mistakes and go down on their end users, they definitely don't want you to go down on them because, you know, that's something that they don't control. Yeah. And so it's not quite the same thing as Twitter, and we do have to be more careful. And, you know, the stakes are also very high too for us.

47:17

Like, I don't wanna be I don't wanna suggest that, like, we're like willy million. Like

Jack

47:21

Yeah. Yeah. Yeah.

Russ

47:22

Like, nine one one runs on LiveKit. And 2525% of nine one one Wow. Runs on LiveKit. And so, you know, it does have to be robust. Like Yeah. When someone's life hangs in the balance, there there's definitely a lot of pressure to make sure that you've thought through some of these failure modes. I'm just yeah. You don't wanna go,

Jack

47:47

like, full in. Is it sort of like a balance of, you don't wanna you wanna try to take some sort of, like

Russ

47:53

Yes. Move. And, of course, there's, like, also, like, all kinds of testing, both automated testing, manual testing, stress testing of the system. We have, like, simulated simulation testing. Like, there's there's many layers of kind of the QA process that we go through before we, you know, roll something new out.

48:17

So, it there's also that. We're we're pretty careful, but we're not trying to engineer around every single possible scenario or scaling, you know, kinda wall that we may hit. I think the important part is to be able to see it coming and then be able to react appropriately before you actually just smash into that wall. Yeah. You know?

48:42

And we've we've we've done that successfully where we've seen it coming and worked around it or scaled over the wall. And then, you know, I mentioned the character AI one, like, we we definitely smashed into the wall at least once.

Jack

48:53

Yeah. And I'm sure to be fair, like, the fact that what they're building is so you've you've got such a strong kernel that you're gonna be way more reliable than them building it internally anyway. So it's like Oh, a %.

Russ

49:04

Yeah. % that. That's definitely true.

Jack

49:07

Yeah. Yeah. That's super cool. Ross, I know we're coming to the end. So is there anything that you wanted to shout out about LiveKit or anything like that?

Russ

49:15

Yeah. Be I mean, we're building cool stuff. Like, I think the developers out there are building even cooler things using the the the product and the the infrastructure. It's all fully open source, so you can build it, deploy it yourself, host everything, or you can use LiveKit Cloud where we make it a lot easier and save you some time, and you can focus on your product. You know, I would love to to chat with anyone out there that is is building with LiveKit or thinking about it and see if I can be helpful.

49:47

So find me find me on x. I always wanna call it Twitter, but, yeah, find me at find find me on xx.com/dsa. Right to left on your keyboard, three letters, helps to have been an early employee.

Jack

50:04

I was about I was about to ask you. You must have a good yeah. Yeah. That's that's wicked.

Russ

50:09

I could tell you a funny story in the last minute if you want. Yeah. Yeah. Let's hear it. Back in the day, it was total Wild West at the company, and we had you know, we could access the production database. And so Sierra Nevada, the beer company, they were they didn't have, like, a great username. I don't know what it was. It was, like, underscore underscore Sierra Nevada Radiograph. And 123. I don't know what it was.

50:37

But one of my coworkers, I guess, I used to pair program with him a lot. He went into the database, and the person who has Sierra Nevada, he, like, changed their username to something, and then he he gave Sierra Nevada, like, at Sierra Nevada. And then a week or something later, this random guy comes in through the elevator to the office, and he's just got, like, 10 in like, with, like, this Dolly, and he's got 10 boxes, like, 24 packs of Sierra Nevada beer. And he's like, is this person here? And, like, we're like, yeah.

51:16

He's right here. And then, like, this guy wheels in. He's like, this is a thank you from Sierra Nevada, and, like, drops it off in the office. Oh

Jack

51:24

my god. It's pretty funny. That's incredible. Have you have you ever had the, like, people trying to get hold of your Twitter handle? Like, I know those they do like the I don't if you ever listened to Darknet Diaries, there's a whole episode about people getting, like, sent pizza, like, constantly as, a threat. Like, you know, they'll just order pizza at your house. It was, a attack until you give up the name.

Russ

51:45

Oh, I've never had anybody go to those kinds of lengths. But when I first joined the company, one of the things that they allowed you to do was you could choose your own username. As long as it had been inactive for six months, they would change it. And so you know, it's early days. And so I was the only person, I don't know why, in the cohort.

52:11

I think I was, like, the seventy fifth or eightieth employee or something. I was the only person who just started to scan all of the single letter username, and at you was open, and so was at x. I should have taken at x. I mean, I could have sold that to to Elon, but maybe. But at you is open, and so I tried at you, and I just got so much spam from people that did not know how to at mention people on Twitter.

52:34

So I was like, hey. Whatever. And they would write their username without the at symbol and then say at you. And so my notifications were just destroyed with that name. So now I'm like, I need to change it to something else. Okay. I'll change it to my last name. It'll be at d s a. Yeah. And at d s a collides with two groups.

52:50

One is the Down Syndrome Association Down Syndrome Association. And so that one, maybe I would give up my username too. But the other one is the Democratic Socialists of America, and that DSA has been pinging me for, like, ten years, like, trying to get DSA from me. It's like it's like Bernie Sanders folks and stuff. Know, I don't talk politics, but and I have nothing against them, but they they definitely have been trying to get my username.

Jack

53:28

Yep. Yeah. Just yeah. Give it give it to the doubt syndrome.

Russ

53:33

Yeah. Yeah. That's not If you

Jack

53:35

if you're gonna you're Close to

Russ

53:36

my heart, for sure. Yeah.

Jack

53:39

Yeah. Yeah. Definitely. Yeah. Russ, that was super fun.

Russ

53:44

Yeah, man. It was.

Jack

53:45

Really, really awesome. Thank you very much. Thanks for joining. Thanks, everyone, for listening.

Russ

53:49

Thank you so much, everyone.

Transcript source: Provided by creator in RSS feed: download file

The startup behind ChatGPT voice - Russ d'Sa from LiveKit

Episode description

Transcript