Building Windsurf with Varun Mohan - podcast episode cover

Building Windsurf with Varun Mohan

May 07, 20251 hr 28 min
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Summary

Varun Mohan, CEO of Windsurf, discusses the engineering challenges of building an AI-native IDE. He shares insights on evaluating LLMs, optimizing for latency, and fostering a culture of embracing failure. Mohan also explores how AI tools are transforming software engineering and the skills that will remain valuable.

Episode description

Supported by Our Partners

•⁠ CodeRabbit⁠⁠ — Cut code review time and bugs in half. Use the code PRAGMATIC to get one month free.

•⁠ Modal — The cloud platform for building AI applications

What happens when LLMs meet real-world codebases? In this episode of The Pragmatic Engineer,  I am joined by Varun Mohan, CEO and Co-Founder of Windsurf. Varun talks me through the technical challenges of building an AI-native IDE (Windsurf) —and how these tools are changing the way software gets built. 

We discuss: 

• What building self-driving cars taught the Windsurf team about evaluating LLMs

• How LLMs for text are missing capabilities for coding like “fill in the middle”

• How Windsurf optimizes for latency

• Windsurf’s culture of taking bets and learning from failure

• Breakthroughs that led to Cascade (agentic capabilities)

• Why the Windsurf teams build their LLMs

• How non-dev employees at Windsurf build custom SaaS apps – with Windsurf!

• How Windsurf empowers engineers to focus on more interesting problems

• The skills that will remain valuable as AI takes over more of the codebase

• And much more!

Timestamps

(00:00) Intro

(01:37) How Windsurf tests new models

(08:25) Windsurf’s origin story 

(13:03) The current size and scope of Windsurf

(16:04) The missing capabilities Windsurf uncovered in LLMs when used for coding

(20:40) Windsurf’s work with fine-tuning inside companies 

(24:00) Challenges developers face with Windsurf and similar tools as codebases scale

(27:06) Windsurf’s stack and an explanation of FedRAMP compliance

(29:22) How Windsurf protects latency and the problems with local data that remain unsolved

(33:40) Windsurf’s processes for indexing code 

(37:50) How Windsurf manages data 

(40:00) The pros and cons of embedding databases 

(42:15) “The split brain situation”—how Windsurf balances present and long-term 

(44:10) Why Windsurf embraces failure and the learnings that come from it

(46:30) Breakthroughs that fueled Cascade

(48:43) The insider’s developer mode that allows Windsurf to dogfood easily 

(50:00) Windsurf’s non-developer power user who routinely builds apps in Windsurf

(52:40) Which SaaS products won’t likely be replaced

(56:20) How engineering processes have changed at Windsurf 

(1:00:01) The fatigue that goes along with being a software engineer, and how AI tools can help

(1:02:58) Why Windsurf chose to fork VS Code and built a plugin for JetBrains 

(1:07:15) Windsurf’s language server 

(1:08:30) The current use of MCP and its shortcomings 

(1:12:50) How coding used to work in C#, and how MCP may evolve 

(1:14:05) Varun’s thoughts on vibe coding and the problems non-developers encounter

(1:19:10) The types of engineers who will remain in demand 

(1:21:10) How AI will impact the future of software development jobs and the software industry

(1:24:52) Rapid fire round

The Pragmatic Engineer deepdives relevant for this episode:

IDEs with GenAI features that Software Engineers love

AI tooling for Software Engineers in 2024: reality check

How AI-assisted coding will change software engineering: hard truths

AI tools for software engineers, but without the hype

See the transcript and other references from the episode at ⁠⁠https://newsletter.pragmaticengineer.com/podcast⁠⁠

Production and marketing by ⁠⁠⁠⁠⁠⁠⁠⁠https://penname.co/⁠⁠⁠⁠⁠⁠⁠⁠. For inquiries about sponsoring the podcast, email [email protected].



Get full access to The Pragmatic Engineer at newsletter.pragmaticengineer.com/subscribe

Transcript

A lot of people talk about how we're going to outweigh fewer software engineers in the near future. I think it feels like it's people that hate software engineers, largely speaking, that say this. It feels pessimistic, not only towards these people, but I would say just in terms of what the ambitions for companies are. I think the ambitions for a lot of companies is to build a lot better product. And if you now give the ability for companies.

to now have a better return on investment for building technology, right? Because the cost of building software has gone down. What should you be doing? You should be building more because now the ROI for software and developers is even higher because a singular developer can do more for your business. So technology actually increases the ceiling of your company much faster. Winsurf is one of the popular ideas of software engineers

thanks to its AI coding capabilities. But what are the unique engineering challenges that go into building it and how could tools like Windsurf change software engineering? Today I sat down with Varun Mohan, co-founder and CEO of Windsurf. We talk about why the Windsurf team build their own LLMs and how LLMs for text are missing capabilities necessary for coding like fill in the middle. How one surf use is a mix of techniques for many cases.

like to solve for search how they use a combination of embeddings and keyword based searches. Wide latency is your number one challenge and how incorrectly balancing GPU compute load and memory load can lead to higher latency for code suggestions popping up. How Varun thinks his software engineering field will evolve, and why he stopped worrying about predictions like 90% of code will be generated by AI in six months.

If you want to understand the engineering that goes into these next generation ideas, then this episode is for you. If you enjoy the show, please do subscribe on any podcast platform and on YouTube. Welcome to the podcast. Thanks for having me on.

You've recently launched GPC 4.1 support in Windsurf, which by the time this is out, it will have been a few weeks. But what are your initial impressions so far? And In general, when you introduce a new model, how do you evaluate how it's working for the coding use cases that we all use? Yeah, maybe I can talk about the second part and then I can talk about, you know, GBT 4.1, the other models afterwards.

Basically, internally, these models have these non-deterministic properties. They sometimes perform differently in different tasks in ways that are unexpected. You can't just look at a score on a competitive programming competition and decide, hey, it's going to be awesome for programming. And, you know, interestingly about the company, maybe this is going to be a helpful context. A lot of us in the company previously worked in autonomous vehicles.

And I think in autonomous vehicles, we had a similar type of behavior where you had a piece of software. The software was very modular, lots of different pieces. Each piece was machine learning driven, so there was some non-determinism. And it's very hard to test it in the real world.

Actually, it's much harder than it is to test, I guess, windsurf out in the real world. It's much harder to test autonomous vehicle software out in the real world because if you ship bad software, you have the chance of hurting a lot of people, hurting a lot of people, hurting the, I don't know, just the general public. So in that case, we needed to build really good simulation evaluation infrastructure in autonomous vehicles. And I guess we brought that over here as well.

hey, if you want to test out a new model, we have evaluation suites. And the evaluation suites not only test end-to-end software performance, but just to say you give a high-level task. What is the pass rate of actually completing the high-level tasks on a bunch of unit tests? It also tests retrieval accuracy.

edit accuracy, right? Redundant changes. All these different parts of a model that are like negative behavior. Because for our product, it not only matters that you pass a test, It also matters that you didn't go out and make 10 steps that were unnecessary because the human is going to be waiting on the other end for all of those.

So we have metrics for all of these things. And we're able to put each model through, I guess, a suite of tests that give us metrics. And that's the way we decide, hey, this is a good model for our end user. Right. And that's like the high level way that we go about testing. And like these tests, you know, they sound great in theory, but in practice, what does it look like?

I'm going to assume you're going to have, you know, we can imagine us engineers who've been writing, you know, code, probably not autonomous vehicles, but similar ones. You know, we know our unit tests, our integration tests. If you do mobile, you know your end-to-end tests. I'm assuming this will be a little bit... Different, but with some similarities. Do you actually code some scenarios? You have example codes, example prompts.

And then I assume we can do a bit of that, but then what else? And how does this all come together? And how can I imagine this test suite? Is it like one big giant blob that runs for I don't know how long? Yeah, one of the aspects of code that is really good is it can be run, right? So it's not like a very, you know, touchy-feely kind of thing. In the end, like, a test can be passed. So what we can do is we can take a bunch of open-source repositories. We can find...

Previous pull requests or commits that actually not only add tests, but also add the implementations corresponding What we can do is instead of just taking the commit description, we can remake what the description of the commit should have been, like a very high level intent. And then from there, it becomes a very, I guess, programmatic problem, which is to say, hey, like...

First of all, find the right files that you need to go and make changes to, right? Then there is a ground truth for that, right? Because the base code actually has a set of five, ten files that changes were made. Then after that, what is the intent on those files? You can actually go from the ground truth backwards, which is that you know what the final change was from the actual code.

And you can have the model generate that intent. And then after that, you can see if the edit, given that intent, is correct. So you now have three layers of tests, which is that, hey, did I retrieve the right things? Did I have the high-level intent correctly? And is the edit performance And then you can imagine doing much more than just that.

At a high level now, just from a pure commit or a pure actual ground truth piece of code, you now have multiple metrics that you can go about. And then obviously the final thing you can actually do is run the... So it's not just like when you measure some of these CHOP products. Actually, the evaluation is a little bit different, which is to say the evaluation is you give it to multiple humans in a blind test.

in an A-B test, and you ask them, which one did you like more? Obviously, for us to quickly evaluate, we can't be giving it to tens of thousands of humans in a second. And with us now, within minutes, we can get answers to what is the performance on tens of thousands of repositories. This episode is brought to you by Modal, the cloud platform that makes AI development simple.

Need GPUs without the headache. With modal, just add one line of code to any Python function and boom, it's running in the cloud on your choice of CPU or GPU. And the best part, you only pay for what you use. With SubSec and Container Start and Instant Scaling to thousands of GPUs, it's no wonder companies like Suno, RAMP and Substack already trust modal for their AI applications.

Getting an H100 is just a pip install away. Go to modal.com slash pragmatic to get $30 in free credits every month. That is modal.com slash pragmatic. This episode is brought to you by CodeRabbit, the AI code review platform transforming how engineering teams shift faster without sacrificing code quality. Code reviews are critical, but time-consuming. CodeRabbit acts as your AI co-pilot, providing instant code review comments and potential impacts of every poll required.

Beyond just flagging issues, CodeDrive provides one-click fixed solutions and lets you define custom code quality rules using AST graph patterns, catching subtle issues that traditional static analysis tools might miss. CodeRabbit has so far reviewed more than 5 million pull requests, is installed on 1 million repositories, and is used by 50,000 open source projects. Try Code Rabbit free for one month at CodeRabbit.ai using the code Pragmatic. That is CodeRabbit.ai and use the code Pragmatic.

I really like how much engineering you can bring in because it's code and because we have repositories, because you can use all these things. It feels to me it gives a bit of an edge to some of the other use cases, just as you mentioned. No, I think you're totally right. We think about this a lot of what would have happened if we were to pick a different sort of category entirely. It's just I think the ground truth is just very hard. You don't even know if the ground truth is great.

In some cases, for all we know, the ground truth is not good. But in this case, I think it's a lot easier because of the verifiability. If you have a good test, it's a lot more easy to verify. Can you give us a sense of what is the team behind Windsurf and also how complex this thing is and how did it even come about?

for all i know you know like a few months ago when this podcast started there was no windsurf there was codium we actually talked a bit about what codium was a little bit different and then out of nowhere Boom, Windsor comes out a week later already in the pragmatic engine. About 10% of people that we surveyed were already using it, which was, I think, the second largest.

usage of tools and people are enthusiastic about it. But I assume there's more to this. It doesn't just come out of, you know, like nothing, right? Yeah, so happy to talk a little bit about our story and summarize it. So we started the company now close to four years ago, which is substantially before, I guess, the, you know, the co-pilot and Chachi BT sort of moment. A lot of us at the company, as I mentioned, previously worked, I would say, on these hard tech problems, you know, AR, VR.

autonomous vehicles. And I guess at that point, what we started building out, and we had a different company name at that point, it was called ExaFunction. We started building out GPU virtualization systems. So we built out systems to make it very fast. and efficient to run GPU-based workloads. And we would enable companies to run these GPU workloads on CPUs. And we would transparently offload all GPU computations to remote machines.

And that could be CUDA kernels all the way down to full-on model cost, right? It was a very low-level abstraction that we provided people. And so much so that if the remote machine died, we'd be able to reconstruct the state of what was on that GPO and another.

And the main use case we targeted were these large-scale simulation workloads for these deep learning workloads that a lot of these robotics and autonomous vehicle companies had. And we thought, hey, the world was going to look like that in the future. A lot of companies would be running deep learning workloads. What ended up happening was in the middle of 2022, I think Text DaVinci 3 sort of came out, which was, I guess, the, you know, the GPT-3 sort of instruction model sort of came out.

And I guess that changed a lot of our priors, like both me and my co-founders' priors, which is to say we thought that the set of models that would run were going to look a lot more homogenous. If you were to imagine, in the past, the number of different models that people would run was very diverse. People would run convolutional neural networks, recurrent neural networks.

LSTMs, graph neural nets. There was a whole suite of different types of models. We thought in that case, hey, if we were an infrastructure company, we can make it a lot easier for these companies to run these workloads. But the thing is, with TextMG3, we actually thought that actually there would be a simplification of the set of models that would run. Why go out and train a very custom BERT model? If you could go out and just ask a very large generative model, is this a positive or negative?

And we thought that that was where the puck was going. I guess for us, we believe in scaling laws and all these things. If it's this good today, how good is a much smaller model going to be in two years? It's probably going to be way better. So what we decided to do was actually focus on the application layer, take the infrastructure that we had and actually build on an application, and that was what Kodium was.

So we built out extensions in all the major IDEs, right? And very quickly, we were able to get to that point. And we actually did train our own models and run them ourselves with our own inference. And the reason why we did that is at the time, the models were not very good. The open models are not very good. And also for the workload that we had, which was autocomplete, it was a very weird workload. It's not very similar to the chat work.

Code is in a very incomplete state. You need to fill in code in the middle of a line. There's a bunch of reasons why this workload is not very similar. And we thought we could do a much better job. So we provided that because of our infrastructure background for free. to basically every developer in the world. And then very quickly, enterprises started to reach out. We were able to handle the security requirements and personalization because the companies not only care about

Hey, it's fast. It's free. But is this the best code for my company? And we were able to meet that workload. And then fast forward to today. And I know that this is a long answer. What we felt was agents in the beginning of last year would be very huge. The problem was the models were not there yet. We had teams inside the company building these agent use cases, and they were just not good enough. But the middle of last year, we were like, hey, it's actually going to be good enough.

But the problem is the IDE is going to be a limitation for... Because VS Code is not evolving fast enough to enable us to provide the best experience for our end user. In a world in which agents were going to write 90% or 95% of software, developers would still be in the loop. But the way they would interact with their ideas would look markedly different. And that's why we ended up building out Windsor.

in the first place. We thought that there was a much higher ceiling on what IDEs could provide. And with the agent product, which is Cascade, we were able to deliver what we felt was a premier experience right off the bat that we couldn't have. How large is the team who's working on Windsurf and how complex is Windsurf as a product? I'm not sure how much we can quantify.

You know, I try to be pretty like, you know, sort of... like modest with with some of these things but just to say we are a pretty small team so right now the engineering team is a bit over 50 people At the time when we, maybe that's large compared to other startups, but if I were to say compared to other large engineering projects in the grand scheme of things, one of the books that I read a while ago was this book called Showstopper. Right. And it's this book about how Microsoft built Windows.

And it's a much larger team, obviously, but operating systems are a very complex piece of software. But my viewpoint on this is that this is a very, very complex piece of software in terms of where the goalposts Which is to say, I would say the goalpost is constantly moving, right? One of the goals that I give to the company is that we should be reducing the time it takes to build applications by 99%. And I would say pre-Windsurf, it was probably 20, and post-Windsurf, it was probably over 40.

But we are very far from 99, right? We're still like, you know, a 60x away from 99, right? Like if there's a 60 units of time and we want to make a 1, we're quite far. So in my head, there's a lot of different engineering projects that we have at the company. In fact, like I would say over maybe close to half of the engineering team is working on projects that have not seen the light of day.

And that's an interesting decision that I guess we've made because I think we cannot be embracing incremental. We're not going to win and be a valuable company to our customers if all we're doing is changing the location of buttons. I think people will like us for great UI, but that cannot be the only reason why.

No, I love it. I mean, this is, you know, when you're a startup, I think you need to say pretty quickly, you cannot just do incremental. You can do incremental later. Hopefully you're going to get there. And what are some interesting numbers that you can share about the usage of Windsurf or the load that you're handling? Because I'm assuming this is just going to... It's pretty easy to tell. It will keep going off right. That's an easy prediction.

No, I think you're right. So one of the interesting numbers, or a handful of numbers, is within a couple months of the product, we had well over a million developers try the product. So it's been growing quite quickly. Within pricing coming out, we reached over, within a month, we reached over sort of eight figures in ARR.

I think all of those are kind of interesting metrics, but also on top of that, sort of we run our own model still in a lot of places. Like you can imagine the fast passive experience is completely our own model. A lot of the models to go out and... and retrieve parts of the code base and find relevant snippets on our own models. And that system processes well over sort of 500 billion tokens of code every day right now. So that system itself is huge. It's a huge world that we actually...

Yeah, and I guess the history of windsurf is interesting once. I understand that you've actually been building your own models for quite the time. You've not just started here. I think for most engineering teams, that would be daunting. And also, it's just a lot of time, right? Like, it's not something that you would just, like, it's harder to do from scratch. I'll say that because nothing's impossible here.

I totally agree with you. I think, you know, one of the weird things is because of the time that we started and the fact that we were like in the very beginning, first of all, we had the infrastructure background, but we were first saying we need to go out and build an autocomplete model. the best model at the time that was open source.

End of 2022 was Salesforce CodeGen. And I'm not saying it was a bad model. It was awesome that Salesforce did open source that model, but it was missing a lot of capabilities that we needed for our product. Right. It was missing fill in the middle, which feels like a very, very obvious capability, but the model... What is that?

So the idea of fill in the middle is basically if you look at the task of writing software, it's very different than chat. And maybe an example of what chat is, you're always appending something to the very end and maybe adding an instruction. But the problem for writing code is you're writing code in ways that are in the middle of a line. In the middle of a snippet of code. In the middle of a function.

And the problem there is actually, there's a lot of issues that pop up, which is to say, actually, the tokenization, so these models, when they consume files, they actually tokenize the files, right? Which is to say they don't consume them byte by byte, they consume them token by token. But the fact that the code, when you write it... at any given point, doesn't tokenize into something that looks like in distribution. I'll give you an example.

How many times do you think in the training dataset for these models does it see, instead of return, R-E-T-U only, without the R-N? Probably never. It probably never sees that. So it's completely out of distribution. But we still need to, when we see RETU, predict you are going to do RN space a bunch of other... It sounds like a very small detail, but that is actually very important if you want to build a product.

And that is a capability that cannot be slightly post-trained onto the models. It's actually something where you need to do a non-trivial amount of training on top of a model or pre-train to get that capability. And it was table stakes for us to provide that for our users. So that forced us very early on to actually build out our own models and figure out training recipes and make sure we could run them at massive scale ourselves.

And what are other things that are unique in terms of building models for code? as opposed to the usual text models. I can think of things like the brackets, for example, and some languages. Maybe this is just naive. You have all seen so many more. So what makes code? What makes it interesting slash worthwhile to build your own model for code?

Yeah, I think what you said is definitely one thing. The fill-in-the-middle capability, I would say another thing. Another thing you can do is code is like... quite easy to, and quite easy is maybe, you know, an overstatement, but quite easy to parse, right? You could actually ASD parse code. You can find relationships of the code. Because code is a system that has evolved over time, you can actually look at the commit history of code to build a knowledge graph of the code.

And you can start putting in these 3,000 What? Did you do that? Yep. Yeah. Yeah, yeah. We look at the previous commits. And one of the things that it enables us to do is build a probability distribution of the code base of conditional on you modifying a piece of code. What is the probability of you modifying another piece of code? So there is You know, when you get into the weeds, code is very information dense.

It's testable. There's a way that it evolves. People write comments, which is also cool, which is to say once a pull request gets created, people actually say, I didn't like this code. So there's a lot of signal on what good and bad looks like within a company. You can use that as a way to automatically make the product much better for companies.

All of us were talking about, I would say, a couple of years ago. And I guess we've been here in this space quite a long time. I know a couple of years is not a very long time in most categories, but in this category, it's dinosaur years.

One of the things that I think is kind of interesting is, in the beginning, we were saying, hey, people would write all these guidelines and documentations on how best to use the product. But the interesting thing is, code is such a treasure trove. You can go out and probably make a good first cut.

on what the best way to write software is inside JPMorgan Chase, inside Dell. You can go out and do that by using the rich history inside the company. So there's a lot of things that you can start doing autonomously as well, if that makes sense. Yeah, one thing I'd love to get your take on how it might have changed.

A year or two ago, when using one copilot started to become more popular, again, on earlier version, companies like Sourcegraph and others have started to build other capabilities. There was this debate of would it be worth fine tuning?

a model on my company's code base, talking about large companies, JP Morgan or those others. And there are two trains of thoughts. One said like, oh, it's probably worth it because our code is so unique and it might be worth it and some other people will think like it might not be worth it because

It might be too resource intensive. The models are too generic. Did you try this out? And where did you land in this? Because I never got an answer to what happened, what was worth it, what was not worth it. So for what it's worth, we did try it out. We built out some crazy infrastructure to go out and try it out. I guess this will be the first place where I talk about the actual infrastructure. We built out systems so transformers have these many layers, right?

And if you were to imagine, when we actually enable companies to self-host, at some point in the past, we were enabling companies to self-host the system and the fine-tuning system as well. So, at that time... You built this out. We built out self-hosted, not only deployment, but also fine-tuning. and the way that that actually worked

was actually quite crazy, which was to say, okay, where do you get the capacity to fine-tune a model if you're already running it for inference? The company may not want to give you so many GPS. So we just said, hey, why don't we use...

the preemptible time, which is to say when the model is not running inference, what if we actually go out and do backprop on the transformer model while this is happening and then what we found was oh the back props take a long time and it might cause downtime on the inference side

So what we enabled it was we enabled the backprop to be able to be preemptible on every layer of the transformer. So if that's to say, let's say you send an inference request and it's going to do, it's doing backpropagation. And it's on layer 10. It'll just stop at layer 10 and it'll continue after your inference request completes. So we built a lot of crazy systems to actually go out and do this.

I guess here's the thing we found. We found that fine-tuning was a bump, but it was a very modest bump compared to what great personalization and great retrieval can do. That's what we found. Now, does that mean fine-tuning in the future is not going to be valuable? I think actually per-person fine-tuning could actually work quite well. I think though maybe some of the techniques that we do it are going to need to change. And here's the way I like to look at it, right?

or any time you build a system, there are many ways to improve it. Some of them are much easier than other ways. And you can imagine there's a hill to climb for everything. And some hills are much easier.

And the right strategy to do when a hill is much easier and it provides a lot of value is climb that hill fully before you go out and do something that's a lot harder. Because when you do the thing that's a lot harder, you are like adding some amount of tech debt if that's not the right solution.

What I described to you in terms of the solution of doing back crop on a layer-by-layer basis, it's a cool idea, but you can imagine it added a lot of technical complexity to the software that might have been unnecessary if we thought that purely doing better retrieval was going to be much better. So there's this, like, I guess there's this tightrope to kind of, you know, balance on top of on how you decide.

Now, I was asking around, I've been using Winsurface as well, but I'm not a very heavy user, but I have been asking around more heavy users, and one of the biggest criticisms... both the windsurf but also of every tool in this area has been like, look, I start off, it's good. It works good. I have a relatively small complex. My project grows either because Windsurf generates code or is just a big project. After a while, it starts to struggle with the context.

Maybe it doesn't see, you know, parts, it gets confused, etc. And clearly, as an engineer, I understand that it is going to be a problem of like, you have a growing context window, you still want to have similar quality. How do you... tackle this challenge? What progress have you made? I think this is a bit of a million dollar question in the sense of if we could somehow have a solution for this, we would be better off. Where have you gotten? on this

I'm assuming this is a pretty common challenge and typical. I think it's a very hard problem. You're totally right. There's a lot of things that we can do, which is to say, obviously, we need to work around the fact that the models don't have infinite context. And when they do have larger and larger contacts, you are paying a lot more and you take a lot more time.

Right. And developers usually a lot of the time don't really want to wait. And, you know, one of the things that we have for our products. We hate waiting. Yeah, exactly. But one of the things that we have for our products that we've learned is if you make a developer wait, the answer better be 100% correct. And I don't think we're at a time right now where I can guarantee you with a magic wand that all of our cascade responses are 100% correct.

I don't think we're at that right now. So there's a lot of things that we need to do that are almost approximations, right? How do we keep a very large context? But despite that, we have chats that are so long that how do you accurately checkpoint the past? Conversation. But that has some natural lossiness attached to it, right? And then similarly,

If the code base gets very large, how do we get very, very confident that the retrieval is very good? And we have evaluations for all of these things, right? This is not something which we're shooting in the dark and being like, hey, YOLO, let's try a new approach and give it to half of our users. But I think you're totally right. There's no... I don't think there's like a complete solution for it. What I think it's gonna be is like a mixture of a bunch of

which is to say much better checkpointing coupled with better usage of context length, much faster LMs and much better models. So it's going to be, it's not going to be, I think, a silver bullet. And by the way, that could be tied with, hey, you know. understanding you know understanding the codebase much better from the perspective of if the codebase already existed.

Able to use the knowledge graph, right? Able to use a lot of the dependencies within the code base a lot better. So it's a bunch of things that I think are going to multiply together to solve the problem. I don't think there's going to be like one silver bullet that makes it so you're going to be able to have amazingly coherent conversations that are very, very long.

To be fair, as an engineer, this might feel weird, but it makes me feel a bit better. We're actually back to talking about engineering step-by-step as opposed to having these It feels like you get a new model. Not now, but early on when we got a new model, it was like, oh my gosh, it's magic. And it took a while to understand how it works, how it's broken down, etc.

You did mention your infrastructure. Can you talk a little bit about how we can imagine your hardware and backend stack if I was to join Windsurf as an engineer? Is it going to be a bunch of cloud deployments here and there? Do you self-host some of your GPUs? A lot of AI startups who are smaller or more modest, they're just going to be a platform as a service. It sounds like you might be at the scale where maybe you're outgrowing this as well.

Yeah, I think we might have just never done kind of, you know, buying off the shelf stuff in the early part of the company. Your background, I keep forgetting this. Yeah, but even more than the background, I think there were cases where we could have and maybe should have. One of the reasons why...

we also didn't was very quickly, we got brought into working with very large enterprises. And I think the more dependencies you have in your software, it just makes it harder and harder for these larger companies to integrate the technology.

They don't want a ton of sub-processors attached to it. We recently got FedRAMP compliant, FedRAMP high compliance. We're the only AI software assistant with FedRAMP high compliance. And the only reason why that's the case is we've kept our systems very tight. Right. And then for these compliances, I did some, but not specifically FedRAMP. What do you need to prove that you are this compliant? Yeah, I think basically you need to map out map out the high levels of sort of all the interactions.

You need to be very methodical about releases and how the releases make it into the system. You need to be very methodical about where data has persisted at a layer that is probably much deeper than SOC 2. I think like going through the SOC 2 versus FedRAMP. I did SOC 2 and that was already pretty long. It's impressive that you did this as a startup scale of congrats.

Yeah. One of the reasons why was, I guess, like one of our first customers that were a large enterprise was like Dell, right? Which is like not a usual first large enterprise. And I guess the startups know. For startup, definitely no. So it forces down a path of how do we build very scalable infrastructure? How do we make sure our systems work at a code base that is 100 plus million lines of code?

What does our GPU provisioning need to look like for this larger team? It's just forced us to become a lot more, I guess, operationally sound for these kinds of problems. Yeah. And how do you deal with inference? You're serving the systems that serve probably billions or hundreds of billions tokens per day, as you just said, with low-related. What kind of optimizations have you looked into?

Yeah, I mean, like a lot, as you can imagine. One of the interesting things about some of the products that we have, like the passive experience, latency matters a ton in a way that's like very different than some of these API providers. I think for the API providers, time to first token is important, but it doesn't matter that time to first token is 100 milliseconds. For us, that's the bar we are trying to look for.

Can we get it to sub a couple hundred milliseconds and then hundreds of tokens a second? for the generation time. So much faster than what all of the providers are providing in terms of throughput as well. just because of how quickly we want this product to kind of run. And you can imagine there's a lot of things that we want to do, right? How do we run? How do we do things like speculative decoding? How do we do things like model perils?

How do we make sure we can batch requests properly to get the maximum utilization of the GPU, all the while not hurting latency? That's an important thing. And one of the interesting things, just to give some of the listeners some mental model, GPUs are amazing. They have a lot of compute.

If I were to draw an analogy to CPUs, GPUs have over two orders of magnitude more compute than a CPU. It might actually be more on the more recent GPUs, but keep that in mind. But GPUs only have an order of magnitude more memory bandwidth than a CPU. So what that actually means is, if you do things that are not compute-intense, you will be memory-bound. So that necessarily means to get the most out of the compute of your processor, you need to be doing a lot of things in parallel.

But if you need to wait to do a lot of things in parallel, you're going to be hurting the lanes. So there's all of these different trade-offs that we need to make to ensure a quality of experience for our users that we think is high for the product. And we've obviously mapped out all of these. We've seen how, hey, like...

If we change the latency by this much, what is this change in terms of people's willingness to use the product? And it's very stark, right? Like a 10 millisecond increase in latency affects people's willingness to use the product materially. It's percentage points that we're talking about. So these are all parts of the inference stack that we've needed to optimize.

Is latency important enough or does the location factor factor into this? Physically how close people using Windsurf are to wherever your server and then your GPUs are running? Can you talk more about that as well? You do need to worry about that. The speed of light starts mattering.

Interestingly, this is not something I would have expected, but we do have users in India. And interestingly, the speed of light is not actually what is bottlenecking their performance. It's actually the local network. So just the time it takes for the packet to get from maybe like from their home to the major ISP. is actually somehow there's a lot of congestion there. And that's the kind of stuff that we need to kind of deal with.

But by the way, that is something that we just cannot solve right now. So you're totally right. The data center placement matters. Like, for instance, if you force a data center in Sydney and you have people in Europe, they're not going to be happy. about the latency. So we do think about where the location of our GPUs are to make sure that we do have good performance. But there are some places where there are some issues that even we can't get around this.

Now, the last time I heard this complaint before Windsurf, because this came up with actually, again, someone who's using Windsurf around the tools a lot said that specifically for one of the tools, he can tell that. The data centers are far away because it's just slow.

Cloud development environments had the exact same thing because they were similar, right? Like this was, I'm not sure they're as popular right now, but there was a time where it looked like it might be the future. You just log onto your remote environment, which is running on CPUs or GPUs somewhere else.

And again, I think it might have to do with as does when you're typing, like when I'm using it, I mean, I'm just used to like, I do want sub second, probably like a few hundred milliseconds. I just noticed that you feel. It's slow, and it just bothers you. No, I agree. I think if I had to even see every time I typed a keystroke a couple hundred milliseconds later, the key would show up. I would rage quit. That would be a terrible experience.

How do you deal with indexing of the code? So you're going to be indexing, you know, depends on the code base, it'll be more or less, but if you add it up, I'm sure we're talking billions or a lot more in code. And for your enterprise customers, you might actually have... the hundreds of millions or even more lines of code. Is there anything novel or interesting that you're using or is it just kind of tried and proven things, for example, that search engines might use?

It's a little bit of both, to be honest. And what I mean by that, that's not a very clean answer. We do try approaches that are embedding based. We have approaches that are keyword based on the indexing. Interestingly, actually, one of the approaches that we've taken that's very different than search, and maybe actually systems like Google actually do this,

is we not only actually look at just the retrieval, we do a lot of computation at retrieval time. So what that means is, let's say you want to go out and ask a question. One of the things that you can go out and do is ask it to an embedding store. and get a bunch of locations. What we found was the recall of that operation was quite low. And one of the reasons why that happens is embedding search is a little bit lossy. Let's say I was to go to a code base and ask, hey, give me all cases.

where this function, this Spring Boot version X type function was there. I don't think anyone would believe Embedding Search would be comprehensive. You're taking something that is very high dimensionality and reducing it to something very low dimensionality without any knowledge of the question. That's the most important. So it needs to somehow encode all the possible, be relevant for all the possible questions.

So instead, what we decided to do is take a variety of approaches to retrieve a large amount of data, and that could include the knowledge graph, that could include the dependencies from the abstract syntax tree, that could include keyword search, that could include embedding search. and you kind of fuse them all together and then after that we throw compute at this and actually go out and process large chunks of the codebase at inference time.

and go out and say, hey, these are the most relevant snippets, and this gives us much higher precision recall. right on the retrieval side to actually go on and and by the way that is like very important for an agent because imagine if an agent kind of like doesn't have access

and doesn't deeply understand the code base, all the while the code base is much larger than the context length of what an agent is able to take in, right? So we've, you know, optimizing the precision recall of the system is actually something that we spent a lot of time and built a lot of systems. It's interesting because it feels like you're...

It shows how, A, it's code, so you can more easily work with it, especially with certain keywords, for example, on some languages. I can imagine that you can even list all the keywords that are pretty common, and you can decide if it's a keyword or if it's something.

special where and if it's a keyword you can already just like do it and it's interesting how you can combine the kind of old school or old school before before elements and then add the best parts of elements but not forgetting about the you know what worked before That's right. I wonder if there's other...

Any other industry that has this, we do have this lower dimensionality space in terms of the grammar and all these things. We understand the usage pretty well. And then the users are power users who actually, the same people use it who could actually build. you know, this tool. Yeah, you know, I feel like Google's system is probably ridiculously complex and sophisticated, for obvious reasons, just because...

For one, they've been doing this for so long, and obviously they've been at the top for such a long time. And then also on top of that, The monetary value they get from delivering great search is so high given ads. that they are incentivized to throw a lot of compute, even at the query time, right? To make sure that the quality of suggestions is really good. So I assume they're doing a lot of tactics there. Obviously, I'm not privy to all the details of the system.

Well, it's interesting because I would have agreed with you until recently, but there are some search engines that are doing really good results. So I wonder if Google is less focused on the actual haystack and the needle and maybe more on revenue, or maybe they're doing it as unvisible. I'm sure they're doing an amazing job, by the way, behind the hood, but I wonder if some of that knowledge has commoditized, but we'll see. But moving on from indexing, in terms of databases,

What kind of databases do you use and what challenges are they giving you? I'm assuming you're not just going to be happy with the usual let's know everything in Postgres. Or do you actually? You might be able to. I don't know. It sounds like these days Postgres can be used surprisingly well for even embedding. Yeah, you know, I think we do a combination of things. So we do like some amount of local indexing. We do some remote indexing as well. Local indexing on the user's machine.

Nice. In some ways, the benefit of that is it helps you build up. If you were to say, hey, you have some understanding of the code base. The problem is that understanding changes very quickly as the user starts changing code, starts checking out new branches. And you don't want to basically say all of your information about the code base you need to throw away. So it's good to have some information about the user's history and what they've done locally.

In terms of remote, I think it would be a lot simpler than people would imagine. One of the complexities of our product, the reason why the product is very complex is actually the fact that we need to run all of this GPU infrastructure. That's actually a large chunk of the complexity because if you were to look at our QPS, our QPS is high, but it is not like tens of thousands of QPS.

Actually, it doesn't need to be that high because in some ways, actually, each of the queries that is happening is actually a really expensive query. It's doing trillions of operations remotely. So actually, the complexity of the problem is how do you optimally... Do that. Right. So we can actually get away with things like Postgres. Like we're not. In fact, I would say I like to keep things pretty simple if it's possible to keep things very simple.

We should not be rolling any type of our own database. I think databases are very, very complex pieces of technology. I think we're good engineers, but we're definitely not good enough to kind of like on the side build our own database. And then for local and listing, what database do you use? Yeah, we have our own combination of like sort of like SQL based database. We have a local SQL database and then like some sort of embedding databases as well that we store locally as well.

What is your view on the value of embedding databases? This has been an ongoing debate for the past, since ChadGBC became big. Again, there are two schools of thoughts. One is we do need... embedding-based databases because they can give us vector search, they can give us all these other features that LLMs and embeddings will need, and the other school of thought is, well, let's just expand relational databases, we add a few extra indexes, and boom, we're done.

from, you know, you're more of a user of this, but you're a heavy user at Windsurf and Codium. What pros and cons are you seeing? I'm just trying to get you to go to one direction or the other. It's a good question. So our viewpoint on embeddings are probably that they don't solve a problem by... They actually just do not. So the answer is going to be mixed. Another question is why do we even do it in the first place, right?

And I think it really boils down to it's a recall problem, right? When you want to do a good retrieval, you need the input to what you're willing to consider to be large and high recall. If you were to think about it, the problem is if you only have something like keyword search and you have a very, very large sort of code base, actually, what happens if the user typos something?

Then your recall is going to be bad. The way I like to think about it is each of these approaches, keyword search, knowledge graph-based retrieval, all of them, they're all in different circles. What you're trying to do is get something where the union of these circles is going to give you the highest recall.

ultimately for the retrieval query. I think embedding can give you good recall because it is able to summarize or actually able to distill somewhat of semantic information about the chunk of code, the AST or the file or the directory and all this. What I would say is it's a tool in the toolkit. You cannot build our product entirely with an embedding system, but also does the embedding system help? I think it actually does help. It does improve our recall metrics.

So I talked with your head of research, Nicholas Moy, and he told me about a really interesting challenge that you're facing, which he called the split brain situation. He was basically saying that it's almost like the team and everyone on the team needs to have two brains. One is just being aggressively in the present, shipping improvements as you go, but also then do a long-term vision where you're building for the long run.

do you do this like how do you start doing it and how do you keep doing it you did mention earlier right that half the team is working on other stuff but you kind of you kind of like split people so like people focus on short-term long-term or Or does everyone, including you, juggle these things in your head day-to-day? It's an interesting one.

Yeah, I don't want to give myself that much credit here. I think our engineers probably should be given most of the credit here. But I think in terms of maybe company strategic direction, both me and my co-founder, the CTO of the company, he... We try to think a lot about how do we disrupt ourselves.

Because I think it's very easy to get into a state where, hey, I added this cool button. I added this way to control X with a knob. And you keep going down this path. And yeah, your users get very happy. But what happens if tomorrow I told you users don't need... and it's an amazing experience and it's like a better experience users are gonna feel like why why do I need to do this so here's the thing users are right up to a certain point

By the way, if they can, then we should not be doing this. They will not be able to see exactly what the future solution should be. If our users can see the future solution better than we can, we should just pack up our bags and leave at that point. What are we actually doing?

So I think basically, you know, you have this tension here where you need to build features to make the product more usable today, right? And our users are 100% right. They understand this. They face pain through many different axes that we don't and we should listen.

But also at the same time, we might have an opinion and it stands on where coding and where these models and where this product can go that we should go out and build towards. And we should be expanding, expounding a large amount of our engineering capital.

Can you talk about some kind of bets that you're... having you know not necessarily giving away everything but like some some promising directions that might or might not work out or even in the past some some problems that maybe did not work out yeah i'll tell you a lot of them yeah so so so we failed a lot um and and i think failing is great And one of the things that I tell our engineers is like, engineering is not like a factory built.

It's actually, you have a hypothesis, you go in, and you shouldn't be penalized if you failed. Actually, I love the idea of, hey, an idea sounds interesting, we tried it and it didn't work, because we at least learned something. And learning something is awesome. And I'll give you an example. The agent work that we did for, we didn't even start beginning of last year. We started even before beginning of last year.

It was not working for many months. And actually, Nick Moy was working on, who you probably spoke with, was the one who was working on some of this stuff. For a long time, a lot of what he was doing was just not working. He would come to us and we would say, okay, fine, it doesn't seem like it's working, so we're definitely not going to ship this, but let's keep doing it.

Let's keep working on it because we believe it's going to get better and better. But it was failing for a long time, right? We came out with a review product. right in the beginning of last year or around then called Forge for code reviews. We thought it was kind of useful internally at the company and we thought we could continue to improve it. People did not find it that useful.

It was not actually that useful. We were going in with the assumption, code reviews take a long time, what if we could help people? And the fact of the matter was, the way we thought we could help people wasn't actually material enough for people to want to take on this new tool.

Right. And there's a lot of things that that sort of obviously that we've tried in the past that just didn't work the way we the way we thought it did. And, you know, for me, I think I would be totally fine if 50 percent of the bets we make don't work. Yeah, and it's a lot of startups say that. And then after all, what I noticed is as a company becomes bigger, I saw this as Uber. It's actually not really the case. There's like failures kind of.

On paper, it's embraced, but actually it's not. So I think there's this tricky thing that when it's actually meant, it's awesome. Otherwise, people just start to polish things and make things look good when they're not. Pretend that it's not a failure, but it was a success.

We're just walking away, that kind of stuff. So it's nice to see that you're doing it. What was the thing that turned the agents around, which then I assume became Cascade? Was it a breakthrough on your end? Was it the models getting better? Was it a mix of something else?

Yeah, I think it was a handful of things. So I'll walk through it. So first of all, the models got better. 100% the models got better. I think even with all the internal breakthroughs we had, if the models hadn't gotten better, we wouldn't have been able to release it. So I don't want to trivialize that matter. It was huge. The two other pieces that were quite important was our retrieval start was also getting better.

which I think enabled us to work much better at these larger code bases. I guess table stakes is quite good at zero to one programming. But I think the thing that was groundbreaking to us was our developers on a complex code base were getting a lot of value from it. And I would say something quite interesting. which is that ChatGPT by itself wasn't incredibly groundbreaking to our developers. Inside the...

And that's not because ChatGPT is not a very useful product. ChatGPT is a ridiculously useful product. It's actually just because you need to think about it from the perspective of opportunity cost and how much more efficient you are. Our developers, a lot of them, have been developers in the past. They are quite, I think we do have an exceptional engineering team. They were used to how to use Stack Overflow and all these other tools to get what they wanted.

Suddenly, when the model had the capability to not only understand your code base and start to make larger and larger changes, it changed the behavior of the people inside the company. And not only make changes, we build systems to very quickly edit the code. The ability to edit code, we build models to take a high-level plan and make an edit to a piece of code very fast. So all of these together made it so that this was a workflow that our developers wanted to use.

We had the speed covered. We had the fact that it understood the code base well. And then we also had massive model improvements to actually be able to call these tools and make these iterative changes. I don't want to diminish that. You have all of these. And suddenly now you have a real product. I've been meaning to ask you this, but how How is the team using Windsurf to develop Windsurf? Because you're doing it, right? You just told me how.

You're doing it. Do you have, from two perspectives, one, from the technical feasibility, I'm assuming, you know, like just, you know, you're not going to work on the exact same. code base or you have a fork or something like that or a build or something like that and then the other hand on like you know do you kind of force people to dog food do people just do it do people get stuck on certain versions do they turn on features for themselves etc

So the way we do it is we do have like an insider's developer mode. So this enables us to test new features. I guess anyone at the company should be able to create a feature and then deploy it to everyone internally.

And now we have a large number of developers who will get feedback. We have an ability for our own developers to dog food new releases. We can have our own developers say, I hate this thing. Please don't ever do this. And it's nice because then we don't need to give it to our own developers. other developers so I think we have this tiered system at the company. We have our own sort of release.

we have next which is future looking products that we that we are releasing that that are a little bit more raw and then we have like the actual release that we give to developers, which we're willing to A-B test things, but we're not willing to A-B test things in such a way where we give people a comically bad experience just to A-B test them. It's bad because people are using this for their real work. So if you're using it for your real work, we don't want to be hurting you.

I think one of the things that's quite valuable to us is probably you would think this is a failure mode for our company, which is that we use Windsurf, largely speaking, to modify large code base. For obvious reasons because I think our developers aren't building these toy apps over and over again. One of our biggest power users inside our company is actually a non-developer. He leads partnerships, he's never written software before, and he routinely builds apps with Windsor.

Right? And he's one of our biggest users inside the company. And we've used this to replace buying other SaaS tools. And he's actually even deployed some of these tools inside the company. What function is this person?

It's partnership. So I'll give you an example of some of the tools. These are not complex pieces of software, but you would be surprised at how much they actually cost. They're six figures in cost because it's bespoke software, right? I'll give you an example. You have a quoting tool.

So the idea of a coding tool is you have a customer, the customer has this size, they're in this vertical, they want this kind of deal, here's the way it would look, here's the amount of discount we're willing to give them as a customer.

And usually these systems are really systems that you would need to pay a lot of money for. And the reason is because, I don't know, there's no reason for our developers to go out and build this internally. It's a big distraction from going out and building our product. But now, on the other hand, you have a domain expert And the person that actually runs partnerships, he doesn't know software, but he knows this really well.

And because of that, he's able to create these apps really quickly. And granted, we do have a person inside the company that looks at the app, make sure that it logistically makes sense. It's secure, can be deployed inside the company. But these are more ephemeral apps, right? They're quite stateless. If you were to look at the input output of this app, it is not as complex as, let's say, the Windsor project.

But now we have this growing set of people inside the company that are not developers that are getting value from, which we found a little surprising. Yeah, and can you also give maybe just like some other examples of what you think it might be your place? The reason being is like, I'm actually really interested in this because I do hear a lot of people either on social media or CEOs saying that SaaS app.

could be the end of it. And I've always been skeptical for the reason that, you know, there's two types of SaaS apps. And most of the SaaS apps I see For example, Workday, which is an HR platform, and they will have hosting, they will have business rules, they will update to some extent with regulations and all that stuff. So they do a lot of stuff that is... The UI, I know we can trivialize, but it's a lot more than that.

And then there are a few of these simpler ones. I don't want to put names, but there's a polling app where internally inside the company you can poll. It has state, but it's relatively simple. You can see behind it, it's just going to... I could build it, but... I just don't want to deal with authentication to host it inside the company, but it's already there. And then there's ones you mentioned that are stateless. So like what kinds of...

SaaS tools, do you see that you're replacing? And you might see other companies potentially using tools like this actually have with one dedicated developer. Build it internally. Bring it in-house. I think it's hubris to believe that products like Workday and Salesforce and get replaced by this. I think you're totally right. These products have a lot of state. They encapsulate business workflows. There's actually, for a product like Workday, probably compliance.

that you need to do because of how business critical the system is. So this isn't the kind of system that this would replace. It probably falls in the latter two categories and probably even just the last one, which is to say kind of these stateless systems that don't do right. to the most business critical parts of your database.

It's probably actually those kinds of systems that very quickly can get replaced. And I would say there's a new category where think about the amount of software that would benefit a business that just isn't getting created that now could get created, right? Because, and the reason why that software couldn't get created is

a company couldn't be created that would be able to sustain itself, that would have an economic, a business model that would justify it existing. But now, since the software is very easy to create, these pieces of software are going to proliferate, right? And one of the things that I'd like to...

talk about for software is there's a little bit of a, we've been, you know, because the cost of building software was a lot higher, right, of simple software was a lot higher. Right now for a front end, we have to admit, it's gotten a lot cheaper to build a basic front end. Radically. Radically cheaper. So I think the way I would sort of look at it is for these kind of systems,

What are you really paying for when you pay a SaaS vendor? You're not only paying for your product, you're paying for the maintenance, you're paying for the fact that actually, you know, this company actually is building a bunch of other features that you don't need. And the reason why is because they need to support a bunch of customers, but you're still paying for that R&D.

You're paying for their sales and marketing, a bunch of other stuff there. So my viewpoint is if you can build custom software for yourself that is not very complex but helps you in your own business process, I think that might proliferate. inside companies. And that might actually cause a whole host of kind of companies that fall into that category. It is simple business software that feels largely stateless to kind of have trouble unless they like kind of reinvent them.

Yeah, and I guess, you know, one obvious reinventing that could happen later is once this happens, let's just continue this thought of, like, companies are building a lot of internal software. They might start to have some similar problems. Let's take, you know, three, five years of that on the road, maintenance, storage.

compliance, just going through it, if they're working, re-evaluating, if it makes sense to actually bring it into something. So this could create a lot of new opportunities for... other software businesses or software developers or, you know, maybe these companies or maybe a new job role in software engineering, which is, you know, I'm now specialized and I've built so many of these apps and I can help you with them. Who knows?

No, I think a lot of people talk about how we're going to have way fewer software engineers in the near future. I think it feels like... It feels like it's people that hate software engineers, largely speaking, that say this. It feels pessimistic not only towards these people, but I would say just in terms of what the ambitions for companies are. I think the ambitions for a lot of companies is to build a lot better product.

And if you now give the ability for companies to now have a better return on investment for building technology, right? Because the cost of building software has gone down. What should you be doing? You should be building more. Because now the ROI for software and developers is even higher because a singular developer can do more for your business, right? So technology actually increases the ceiling of your company much faster.

Yeah, and I'm going to just double click on that because you have been building Windsurf and you've been building these tools, but you've also worked with the team, in fact, with the same team even before these tools. Today, one of your solid engineers who was a solid engineer four years ago. How? Has their work changed now that they have access to Windsurf, Agentex?

you know, Cascade, all these other tools, including, you know, like ChatGPZ, et cetera. What's changed? And then not just your engine, but also the team that you had four years ago, you know, that was doing work. How has their work... change in terms of, I don't want to point you in any direction, but I'm just interested in what you would say. How does that seem different in what they do or how they do or how much they do?

I think there's maybe a couple things. So first of all, the amount of code that we have in the company is quite high and now dominates what a single person knows at the moment. So in the beginning of the company, that's not the case. So actually, this is something that I can't point to because the companies are quite small. Right now, I would say there's more fearlessness to jump into a new part of the codebase and start making changes.

I would say in the past, you would more say, hey, this person has way more familiarity with this part of the code. That is still the case. When you say familiarity, now it's like understanding the code, but this person also knows where the dead bodies are.

Which is to say, hey, we're all, you know, you did X and you got Y. And that means you always should do Z, right? And there are still people like that at the company. And I'm not saying that that is not valuable, but I think now... engineers feel more empowered to go out and make changes throughout the code base. And the second key piece is our developers now go to the AI first to see what value it would generate for them before making it.

Which is something which I would say in the autocomplete days, you would go out and type it and you would get a lot of advantage from autocomplete and the passive AI. But now the active AI is something that developers more and more reach towards to actually go out and make changes at the very beginning. I'm interested in how this will change software engineering because I also noticed both things on myself.

I still code and I do my side projects but I always drag my feet of getting back into the context of the code that I wrote which was you know I kind of forgot part of it getting back into the language because I use multiple languages because it's a side project. And AI, it does help me just jump into it. I no longer have the thing. And sometimes, yeah, I just prompt the AI saying, what would you do?

I just want to know. And then if it looks good, I do it. If it not, I just scrap it. Maybe I prompt it or sometimes I just like, nah, I'm just going to do it because either I didn't give it right instructions. You know, there's this thing, especially when you're working on stuff, you know, the code base, you've onboarded, you know what you want to do, but I think it really helps. It helps me at least with effort. Sorry, with...

With a thing that wouldn't take much creativity, but it would just be time, a drag, figuring out the right things, finding the right dependency, changing those things, that kind of stuff. I think you're exactly right. I think this reducing friction piece is something that is, it's kind of hard to quantify the value because it makes you more excited to do more.

I think software development is a very weird profession. And I'll give you an example of why. It's weird. And a lot of people would think, oh, this is a very easy job. And I actually think it's quite hard on you mentally. And I'll walk you through what I mean by that. It's, you know, you're doing a hard project. You sometimes go home with incomplete, with, you know, with an incomplete idea. The code didn't pass a bunch of tests.

And it just bothers you when you sleep. And you need to go back and kind of fix it. And this could be for days. And for other jobs, I don't think you kind of feel that, right? It's a lot more procedural, potentially, for other types of jobs. I'm not saying for every job. There are obviously jobs where there's a massive problem-solving component.

But that just means that this kind of, you do get a fatigue if you, you know, at some point, even the easy thing is just forcing you to do new easy things. It adds some amount of mental fatigue. And I think. You now have a very powerful system that you now trust. that should ideally reduce this fatigue and be able to do a lot of the things that are in the past high activation energy and do it really fast.

Yeah, this is really interesting because I was just talking with a former colleague of mine who had a few months where he just wasn't producing much code. Really good engineer, really solid. And at the time, I didn't know why. And he didn't tell me. And then he kind of snapped out of it. But we're just talking. He said that actually he was at a really bad time in his life. lots of stress in a relationship and at home with family, all these things. And he said that.

he just realizes how mental okay game software engineering is he at work he just couldn't get himself to you know get into the zone we know how it is especially before ai tools and What you said, I'm starting to get a bit of an appreciation on the fact that I remember, you know, I couldn't turn off like you go home, you're having dinner, you're still thinking about how you would change that or why it's not working.

I don't think we'll be able to go onwards, but I think for listeners, it's worth thinking about how weird it is. I think it's good to reflect on it because it is a unique... For so many jobs, you can actually, you know, just put down your work and leave the office and you cannot continue. And that's it. I cannot even think about it because all your work is there. And also like how these schools might just change it.

for the better in many ways and maybe just in weird ways that we don't expect in others. No, I think you're totally... This idea of... I think this is why finding amazing software engineers is very... It's rare. It's rare. Because... These people are people that I guess have gone through this and are willing to put themselves through the idea of like, hey, all of the learnings that I had from like the lowest level to the highest level, and then willing to go down to the weeds.

to kind of make sure you solve the problem. It's a rare skill. It's that, you know, you would imagine, hey, this is something that everyone would be able to do, but it, like, takes a lot of dedication, and as you pointed out, it's like this, you know, for an activity that is not a very normal activity.

Yeah. Well, going back to engineering challenges and decisions, one super interesting thing that I've been dying to ask you is, you did mention in the beginning that, you know, like, when you started Windsurf, you realized Visual Studio Code is just It's not there where it should be. However, you started by forking Visual Studio Code, right? Do I know that right? That's exactly right.

Can you tell me the pros and cons of doing this as opposed to building your own editor? And I'm aware that there are some downsides of doing this. There's some licensing things. So that's one part of the question. The second part of the question, like, why did you think that forking... is the right move to build a much better, much more capable thing of whatever Visual Studio VS Code was back in the day.

Yeah, so just maybe some clarifications just on terminology. VS Code is like a product that is built on top of Code OSS, which is the ultimate... which is basically the open source project. I did not know that. Yeah, because VS Code has proprietary pieces on top of the open source. On top of the open source. I do know that, and a lot of people don't know that, actually. Yeah, exactly. So I guess one of the things that we actually did was we wanted to make sure we did this right.

And what I mean by that is when we actually built our product, we did fork Coda OSS, but we did not support any of the proprietary. pieces that Microsoft had and we never actually provided support. for those, not through a marketplace or anything. We actually use an open marketplace.

And it's completely fine. And this, by the way, this forced us to actually have to build out a lot of extensions that people needed and bake it into the product. I'll give you an example. For Python language servers, we actually now, we have our own version. For remote SSH, we have our own version. For dev containers, we have our own version. This actually forced us to get a lot tighter on what we need to do and we never took, I guess, a shortcut.

of, hey, let's go out and do something that we shouldn't be doing. Because we work with real companies, we work with real developers, and why should we be putting them in that position? I guess we kind of took that position. So that was the positioning we had. Obviously there were some complexities, but this just caused us more engineering effort before we launched the project.

We did launch the product with an ability to connect it to remote SSH and do all this other stuff. And we did have an internal engineering effort to actually go out and do that. Now the question might be, why even fork VS Code? I think it's because it's a very well-known idea where people have a workflow. There are also many extensions

there that people rely on that are extremely popular. An ID is not just the place where you write software, it's also the place where you attach a debugger and do all these other operations. We didn't want to reinvent the wheel on that. We didn't think we were better than the entire open source community.

in terms of all the ways you could use the product. And I'll give you an example of maybe how we're trying to be pragmatic here. We didn't go out and try to replace JetBrains with this product. We actually put all the capabilities of Windsurf into JetBrains in what's called a Windsurf plugin. This is where our goal is to meet developers where they are Meeting VS Code developers where they are means we should give them a familiar experience.

Needing JetBrains developers means we should give them a familiar experience, which is actually use JetBrains. And now a question might be, why didn't we fork JetBrains? And the answer is two reasons. First of all, we can't. It's close. Second of all, the answer is actually because Jebrons is actually a fantastic IDE for Java developers and in a lot of cases C++ and Python developers.

and so far it's php as well php storm if you ever read them that's exactly right so they have one for almost every single language For every single language. And the reason is because they have great debuggers, great language servers that actually think are not even present on VS Code right now. Like if you are a great Java developer, most of them and probably 80 plus percent right now use IntelliJ.

So the question there is, I think as a company, our goal is not to be dogmatic. Our goal is to build the best technology and provide it and democratize it and provide it to as many developers possible. I love it. I was talking with one of your software engineers who did mention an interesting challenge because of just this, the fact that

you do have a JetBrains plugin, and then you have the ID, and now apparently you're sharing some binaries between the two. Can you talk a little bit about that engineering? Yeah, so this was actually an engineering decision we needed to make a couple months into starting working on Podium, which was that, hey, we're going to go out and build a VS Code extension. That's what we started out with. But very quickly, the next step is, let's go implement it in general.

The problem is if we need to duplicate all the code, it's going to be really, really annoying for us to support all. So what we decided to do was actually go out and build almost a shared binary between both that we call the language server that actually does the heavy lifting.

So the goal there is hopefully we're not just duplicating the work in a bunch of places. And this enables us to support many, many ideas from an architecture standpoint. And that's why we were able to support not just JetBrains. Vim, all of these other IDs that are popular without much less. Okay. I need to ask you about MCP. You have started to support it, which is really cool. I play around with it, and I think it's a good first step.

What is your take on MCP, especially with the security worries? And also, where do you see MCP going right now? I think it's a bit of an open book, but you are probably a bit more exposed to this than most listeners will be. You know, I think it's very exciting. I have some, maybe one concern, but let me start with the exciting part. The exciting part is now it's... democratizing access to everything inside a company or everything a user would want within their coding environment.

uh for our product in particular obviously there are other products maybe it can help you buy buy goods and grocery and stuff like that obviously we're not that interested in that case but um but that is that is One of the other things that it lets companies do is they can implement their own MCPA servers with security guarantees, which is to say they can implement a battle-tested MCPA server that talks to an internal service.

and all these other things for the end user, and they can own the implementation of that. So there's a way for companies now to enable us to use to interact with their internal services in a secure way. But you're totally right. There could be a slippery slope where this causes everyone to have immediate access to everything in a right-based fashion that could have negative consequences. But the thing I'm like, I'm particularly maybe a little bit.

worried about, and it's not worried, it's more so like the paradigm itself, is MCP the right way of encapsulating talking to other systems, or is it like actual workflows of developers? One of the problems with MCP is it forces you to hit a particular spectrum. And you know this Actually the best spec is flexibility.

It's flexibility. And, you know, if you ask these systems now to integrate with another, like you ask an L1, like a GPT-401 or a Sonnet, hey, you know, build an integration to this system, to a Notion. It will do it zero shot. So you could build an MCP server that is particular, that only lets you have access to two things in Notion. Or the models themselves are capable of doing a lot. And it's like, how much do you want to constrain versus have freedom?

And then also there's the corresponding security issue too. Look, it's awesome that we have access to it. Is this the final version? I don't know if this is the final version. Yeah. I'm going to rephrase it. I'm lopping off. You think I'm off, but-

When you set up, for example, I'm building a web project and I'm using Node, and I have my packages JSON that specify what packages I'm going to use. Now, on my machine, I will have a lot of packages installed, but for each specific project, I'm going to be very clear. of what I want to use, what package, maybe a subset of it. Right now it feels to me that the current version of MCP, it just lets me connect everything I can.

really, you know, say that, for example, on this project, like, I actually want you to only talk to this table in my database. I don't want you to access all the other stuff because it's just a prod database and I have a test table there. That kind of stuff, right? Are we talking about this granularity and figuring out what would actually help me as an engineer be productive?

No, it's an interesting point. You're totally right. You want these systems to have access to a lot of things so that you can be productive. All the while, you want to be imperative and very instructive on what systems they should have access to.

internally but the problem is people are very i'm not gonna say lazy but it is annoying if you have 50 services and you're gonna tell it you need to do this you need to do that you need to do this and what can very quickly happen is people don't and they get like mixed results or it has like negative consequences so

Look, I think we're figuring this out. I think the whole industry is kind of figuring this out, what the right model is. And maybe it actually is a lot of engineering that needs to get done post the MCP server, which is to say the MCP server provides a very free-flowing interface.

but there's a lot of understanding of the server to who the user is, what service they're trying to touch, what code base they're in, and there's proper access controls that are implemented afterwards that helps you kind of like... I'm thinking these languages are not really popular, but when I started programming, I used C Sharp.

And in C Sharp, for the classes, you had keywords, you know, you have classes, but you couldn't just access them. You had public classes, which everyone can access. You had protected classes. You actually had internal classes that were inside the module. You had private classes. which were not accessible unless you were a child class. And these were just keywords of how what module can access what parts of your code.

inside the club base and we back then this was like the 2000s we spent a lot of care deciding who can access what and how even though technically you could have just everyone could have chalked up with everyone but we decided this was you know evolution of a few decades that it wasn't a good idea So I'm wondering if we're going to get there, for example, with MCP, we might reinvent some parts of it because that didn't come up because it's like, you know, like someone thought it was.

just lick their finger. It was because we needed it to organize large amounts of code back then when we didn't have the tools that we have today. No, I think you're right. I think some primitives are missing right now, for sure. It's true for a farm right now.

It's going to be super exciting, though, because we are seeing that it is going somewhere, maybe MCP, maybe not, and we're in the middle of it. Who knows? Some people listening to it might actually influence the direction of this new thing that we're going to use in five years from now. It's awesome. Yeah. What is your take on this 70-30% of mental model for AI tools? This is something that comes up every now and then, especially with folks who are less technical.

Today, they can prompt AI tools from Winsurf to Lovable and others of like, hey, generate this idea that I have, and they do a good job of the one shot or the tweaking. And then the last 30%, especially when they're not experienced software engineers, they just get a little stuck or hopelessly stuck. Do you observe this with Winsterfusers, or this is not really a thing when people are pretty technical and developers?

Yeah, I think we do have non-developers that use the product. And I do think the level of frustration for them, and by the way, my viewpoint on this is not just let them be frustrated. I would love to help them. But the level of frustration when they have a problem is much higher.

And the reason is because for you and I, when we go on and use this and it gets into this degenerate state where it goes out and it tries to make a change and it does a series of changes that doesn't make sense. Our first instinct is don't just like.

and do it 10 more times when five times it didn't work, it's probably like look at the code and see what step didn't work and we're going back to the step that works, right? Like debugging principles. But that's, by the way, the reason why we do that is we understand the code. We can like go back into the code and kind of understand it. But you're right that for developers that can't, they're kind of in a state of helplessness. And I deeply empathize.

And it's our job to figure out ways that we can make that a lot better. Does that mean we make our product completely catered to non-developers? No, that's actually not what we do. Are there principles from that that we can take that help both?

both groups right because i think for us we do want to get to a state where these systems can be more and more autonomous right now a real developer needs to go out and needs to fix these issues all the time when they prompt it it also just means we're getting we're farther and farther away from being autonomous as well

So that's kind of the way we think about it. But I do think as an industry, and this is, you know, there's engineers who like the coders and then the non-coders, there is a question that needs to be asked of, Do we eventually need to understand what the code does? Do we need to be able to read the code? Because, for example, when I was at university, we studied assembly. Now, I never really programmed assembly beyond the class, but...

I have since came across assembly code and I'm not afraid to look at it. Now, again, I'm not saying I'm the expert, but you can go all the way down to the stack. And I think there is something to be said that, you know, we're now adding a new level abstraction that. As a professional, it will always be helpful to be able to look through the stack. Sometimes all the way to networking logs or the packet.

Not often, but just knowing where to look and eventually where to go. So this might be more of a philosophical question, because I think a lot of people don't want, they just think, okay, we can just use English for everything. But it does translate into a level, which is programming languages, translates into... the next level, and so on. I think you're right. So here's my take on it. We're going to have a proliferation of software.

Some of the software will be built by people that don't know coding. I think it feels simplistic to say that that is not going to happen. And we're already seeing it in real time. But here's the thing. It's almost like when you think about the best developer that you know. Even if they're a full-site developer.

They probably understand when the product is slow. It's because there's some issue with the way that this interacts with the operating system. And there's some issue with the way that this interacts with the networking stack. It's the ability for this person to kind of peel back layers of abstraction to get to ground truth. That is what makes a great developer a great developer. And these people are more powerful.

They're more powerful in any organization. You know that you can take these people and put them on any project, and it's just going to be a lot more successful with them. And I think the same thing is going to happen, which is that some set of projects... It is going to be fine if the level of abstraction you deal with is the final application plus English.

For some other set of applications, a developer will go in, but there's going to be some gnarly nature. It's going to interact with the database. It's going to have performance-related issues. And you're going to have an expectation that the AI and the human can go down the stack and the human can reason. And I think these people are always going to be really valuable. Similar to how I think actually our best engineers can, if I ask them to, go and look at the object dump.

of a C++ program and actually understand, hey, actually, here's a place where we're, here's a function, here's a place where we're seeing a massive amount of contention and we need to go out and fix this, right? If the developer didn't understand the sort of fundamentals, they would be much worse at our company because of

Yeah, I wonder if an analogy might be that a car mechanic, you know, car mechanics evolved over time. Like my dad used to, we used to have like these old school cars where he would take apart the engine. He would take the whole thing apart and then put it back together.

Over a weekend, like all the parts lay in, I remember. And of course, by the time I got to owning a car, I could change the oil. And now I have an electric car, which is, you know, like there's not as many moving parts. However, someone who understands... how cars work, how they're built, how they evolve.

They will always be more in demand for special cases. For example, I just had my 12-volt battery die in my electric car. I had no idea there was a 12-volt battery, but apparently I talked with someone who is innocent. Like, yeah, it's from the gas cars, and this is why, and this is the reason, and this is how the new version will evolve.

So, like, and clearly we will, the majority of people might not need it eventually, but there is that expertise. Plus, these are the people who understand everything who will often take innovation forward because they understand what came before. and they understand what needs to come. You're totally right. Maybe one other thing that I would want to add to what you basically said is when you look at...

what great computer scientists and software engineers do. I think they're great problem solvers given understanding. sort of a high-level sort of business case or what the company really wants to do. And there are people that can distill it down. And I think that skill is actually what I think boils down to when you meet great engineers. It's not just like you tell them about a future. You tell them about an issue.

a desired outcome and they will go out and find any way possible to go out and get to get to that i think that's what great engineers are they're problem solvers and that's always going to be in demand now is the person that builds the most boilerplate website And that is the only thing they are excited to do in the future. That person's skill set is going to be depreciating with time. But that's a simplistic way of looking at it.

If they were a software engineer, they should know how to reason about systems. They should be good problem solvers. I think that's the hallmark of software engineering as a whole. And they will always have a position out there. Now, since you started to build Windsor or even Codium, how has your view changed on the future of software engineering? And we've touched on a few things, but have there been some things like before and after, now you're thinking about things different?

I think that timelines for a lot of things, I'm less scared of them, even though I think a lot of them are supposed to come. come out as scary numbers. I think recently Dario from Anthropic was 90% of all committed code is going to be AI-generated. I think the answer to that is going to be yes. My question after that is, so what? So what if that's the case? Developers don't only spend time writing code. I think there's this fear that comes from all this stuff.

I think AI systems are going to get smarter and smarter very quickly. Look, when I think about what engineers love doing, I think they love solving problems. They love collaborating with their peers to find out how to make solutions that work.

And I think when I look at the future, it's more like things are going to improve very quickly, but I think people are going to be able to focus on the things that they really want to do on their developers. Not like the nitty gritty details that, as you said, you go home and you're like... I don't know why this doesn't compile. I think that will a lot of those. Small details.

for most people, are going to be a relic of the past. Well, I'll tell you, I'll give the idea to decide why people are stressed. And they're going to say, some listeners will say, well, you're in an easy position because you're in the middle of an AI company building all these tools, which is the future, right? and you're going to be fine for the next few years. And they're thinking, I'm sitting at a Visa B SaaS company where I'm building a software, and my employer is thinking that...

These things make us 20% or 25% more efficient, and they're going to cut a quarter of a team. And I'm worried, A, if it's going to be me, B, the job market is not that great, and I get it that I can be more productive with these things, but I still need to find a job. And that is the...

You know, like, not everyone will verbalize this, but this is the thing that gives people, you know, when they're hearing Dario talk about the 90%, they're thinking, oh, damn, my employer will say, like, okay, Joe, we don't need you anymore. Yeah, the problem is, I don't know what, like, maybe this is, like, I don't know if this is, like, a real good answer, but that feels like the employer is being, like, irrational. Because, okay, let me provide that take.

If the B2B SaaS company that is not doing well needs to compete with other B2B SaaS companies, If they reduce the number of engineers that they have, they're basically saying their product is not going to improve that quickly. Compared to a competitor that is willing to hire engineers and improve their software much more quickly I do think consumers and just businesses are going to have much higher expectations for software

The demand for software that I buy is way higher. I don't know if I've noticed this. I feel bad when I buy a piece of software that looks like it did a couple years ago. That's like this ugly procurement software. These days, you don't have a... I hear you. I see the short term of like, are there employers that look at this and they're like, this is an opportunity to cut. I think these employers are being really, really short-sighted.

Yeah, and I think I'm getting a little bit of hope from even other industries. There was a time where people, writers were being fired left and right. Like I'm not saying software writers, but like just like old traditional writers. And now there's a big hiring spree from all sorts of companies of hiring writers because

Turns out that AI is kind of, you know, it's a bit bland and a great writer with AI is way better than without, I think, same for software engineer. So that's also a bit of my message, Fred, and anyone listening. It was just good to hear from you.

Exactly. When you have a competitive market and you add a lot of automation, automation is great, but what you actually need to compare is automation with a human. And if that's way more leveraged, then you actually should compete with that. That's like the game theoretically optimal thing to do.

And actually, that's the tool that you're building right now, which I think is one of the reasons I like to use it. It doesn't feel that it's like trying to do anything. Instead of me, it's doing it with me and making me way more efficient as an actor. So to wrap up, I just have some rapid questions. I'm just going to ask them and then you can shoot the answer. So I've heard that you're really into endurance sports, long distance running, cycling, and you do just a lot of it.

A lot of people are thinking, well, I'm pretty busy with my job, with coding, etc. I don't have as much time for sports. How do you make time for sports and what would your advice be for someone who is like actually want to get in a lot better shape while being a software engineer and busy with what you're working with?

So I will say this, like since the company that has gone down drastically, but my previous company, I still worked a ton. I worked at an autonomous vehicle company. I would bike over like 150 miles. a week rigorously, probably close to 160, 170. I think it's just, interestingly, for an activity like this, I actually got Zwift, so this way to bike indoors. And I would just be able to knock out like 20 to 25 miles in an hour, like at home. And the benefit there is like...

Now I can come back from work very quickly, do a ride. And then, you know, on the weekends, on a Saturday, I would just dedicate being able to do. potentially like 70 a 70 mile loop uh somewhere one of the lucky things for for me is i'm in the bay area so there's a lot of like amazing places to ride a bike

like that have hills and stuff like that. So I think it's easy to carve out this time, but you kind of, you know, you need to make the friction for yourself a lot lower, right? I think if I needed to... I would never go to a gym rigorously. I think I'm not the type of person that would just find a way to not do it. But if it's literally at home right next to where I sleep, I'm going to find a way to do it. Sounds like just make it work for you.

Yeah. And what's a book that you would recommend and why? You know, there was a book that I read a long time ago that I really enjoyed. It's called The Idea Factory. It's basically about how Bell Labs kind of like innovated so much while being a very commercial entity. And it was very interesting to see some of like the great scientists of our time working at this company, providing so much value. So like information theory, cloud gen and work there.

The founding of the transistor happened shockingly, and all these people were there too. And just hearing how a company is able to straddle the line between both was really exciting. Yeah, and I hear that OpenAI got inspired by Bell Labs a lot. Their titles are coming back, and I think I personally want to read more about that, so thanks for the recommendation. Well, thank you. This was great. This was super interesting and just love all the insights. Yeah, thanks a lot for having me.

I hope you enjoy this conversation with Varun and the challenges that the windsurf team is solving for. One of the things I enjoyed discussing was when Varun shared how they have a bunch of features that just didn't work out, like their review tool, and then they celebrate failure and just move on.

I also found it fun to learn how any developer can roll out any feature they built to the whole company and get immediate feedback, whether it's good or bad. For more deep dives on AI coding tools, check out the Pragmatic Engineer Deep Dives link in the show notes. If you've enjoyed this podcast, please consider leaving a rating. This helps more listeners podcast. Thanks, and see you in the next one.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast