Andrew Filev, founder of Zencoder: AI Software Engineering agents - podcast episode cover

Andrew Filev, founder of Zencoder: AI Software Engineering agents

Jul 08, 202550 minEp. 143
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Andrew Filev is the founder of Zencoder. Zencoder is building AI coding agents.

In this episode, we explore the evolution from simple code completion AI to more sophisticated software engineering agents. While tools like GitHub Copilot revolutionized code suggestions, the next frontier involves AI agents that can handle complex engineering tasks and collaborate with each other through emerging protocols.

The discussion dives into agent-to-agent protocols, which enable AI systems to work together autonomously on software development tasks. This advancement suggests a future where AI agents could manage entire development workflows, from requirements gathering to testing and deployment. We also touch on the importance of using slower summer periods strategically - making it an ideal time for engineering teams to evaluate their tooling, processes, and prepare for upcoming development cycles.

This episode is brought to you by WorkOS. If you're thinking about selling to enterprise customers, WorkOS can help you add enterprise features like Single Sign On and audit logs.

Links
- Zencoder
- Andrew Filev
- Wrike
- Powered by Claude
- Vercel
- Perplexity AI
- Scale AI 

Transcript

Intro / Opening

Andrew Filev

So when AI works, it is absolutely more cost efficient than human labor. Originally, the models could only solve about 4% of those tasks. May, we submitted our agent solving 70% of the issues. An agent fails, and that engineer manually solves the problem. Wife quarter good wife quarter, he will try to solve it with their agent.

The senior engineer will have the same level of AI capabilities and the wife coder will have much more advanced harness and tooling and prompts and whatever. He he will actually be a better AI engineer, if you will.

The future of AI software engineering

Jack Bridger

I don't know about you, but I have almost zero idea how software engineering is going to unfold with agents. But I'm joined today by Andrew, founder of Zencoder and formerly the founder of a multibillion dollar project management tool called Wrike. Andrew helps us understand one or two steps ahead on where AI software engineering is heading so we can hopefully build better dev tools.

Andrew Filev

We saw already one complete generational change, and there's there Mhmm. One that's coming very soon. So so almost like two generational changes in, well, like, twelve month period. And so and when those come, they're so good time to reflect and say, okay. Well, like, if I started today, how would I approach it?

And then and then you didn't start today. Right? You've you've been building towards it for the whole year. So what from what I've been building, what makes me much stronger and helps me be the best and push harder, and what is no longer super relevant is new reality, and there's kind of a better way to to ring back this. And that goes to architecture that goes to some protocols as well.

As you know, for example, on the integration side, MCP, that was proposed by Entropic and then picked up by by other vendors like a wildfire. So that's right now is the de facto standard for integrating different tools into agents. It has many other things that it can do, but but primarily where everybody's using it is bringing tools out to to agents. And so that became de facto standard. That's one protocol.

Another protocol that looks very interesting right now is a agent to agent protocol as as we're starting to see more agent interaction, kinda more of, like like like, farms, a little armies of agents that that collaborate or teams, I should say, teams that or agents that collaborate. So or or pipelines of agents, well, like, well. So a lot of interesting things

Jack Bridger

have been done. Like a protocol for how they, like, defined how they parcel

Andrew Filev

How they can information interact with each other or changing yeah. Correct. And so I I feel summer is a great time to kind of reflect on those things and and make sure that you're ready for the next sprint. So come come falls, as you know, everybody comes back from vacations and, well, like, you you got conference after conference, it's lunch after lunch. So, again, summer is a good good time for reflection.

Evolution from code to engineering Agents

Jack Bridger

Yeah. And on reflection, you mentioned the generations of code agents or I don't know what the right word for what what would

Andrew Filev

you

Jack Bridger

call

Andrew Filev

that category? Right now, they're the more correct word becomes software engineering agents. All started with engineering agents. Right? Yeah. But right now, they they do much more than coding, so we're moving one of the things that's happening is we're moving from coding agents to software engineering agents.

Jack Bridger

Okay. So for software engineering agents, can you tell us, like, the generations? Like, even the first generation Yeah. And, like, rough timeline, and then maybe, like, where it's going?

Andrew Filev

So the first generation that caught everybody's attention was AI code completion, and the market maker was GitHub Copilot. They were by the way, they were not the first product. Some people know the pre prior solutions like Kite and whatnot, but most people never heard of them. So they're the the poster child there, the household name was GitHub Copilot. Yeah.

And it promised a lot of things, but they're they're really sticky feature that developers liked and actually used was their was the code completion. And it's nice, it's helpful, but it's been somewhat controversial. And and as usual, part of the controversy comes from kind of sensationalism on both sides. Like, on one side, there were claims that code completion is gonna make you, like, 30% more productive, which was not the case. And on the other side, there there were also some sensationalism talking about, like, how the code is, really bad, and it's gonna Mhmm.

Like, ruin everything and whatever. Right? So everybody started trying to make the name for themselves. The the truth as usual was was in between. Like, we could do this really nice and developers a lot of developers liked it, but it wasn't necessarily, like, a big a big game changer.

So so that was first generation, and that's kind of roughly when we started our company. And we looked at that the space, and we said, we recently realized that we can do much better job at generating code for for two reasons. So so one is so basically, the the products back then were single short LLM calls. So what you back then would call kind of thin LLM wrapper. And a lot of thought in the market was how do you buy a bunch of GPUs and make models smarter?

And, like, people were, like, trying to make everything out of models. Like, how how do we teach models to to compile code and debug code and this and that and so. But but but kind of by the model itself, like, try to try to build this kind of a mega brain that this, like, AGI, super AGI. Right? Like, how do we build this mega brain thing that you just tell it to do something and it magically does it for you?

And I saw a lot of challenges with that with that approach, and I I thought that a more efficient approach that that gives you much more value, a much quicker and more efficiently was that to focus on what happens before LLM and after LLM. So before LLM is they're extremely sensitive to the context that you give them, and the context could be your prompt, the context could be in case of repository. It could be the files that it needs to work on. It could be higher level context, like what's the architecture of that repository, what design patterns are being used, and whatnot. So LLMs are extremely sensitive to that because they're stateless.

So they know nothing about your specific problem, nothing about your repository. They only know what you give give to the admins. And and from that perspective, it reminds me the good old 50 first aid movies. I don't know if you you remember, that that that one.

Jack Bridger

I still have a stigma.

Andrew Filev

The character has had an amnesia, and so, basically, every every single day was a brand new day for for for him. Right? So for for her, I should say. So we we we have the same thing here. Like, every single time you ping an alum, they know nothing about you or your project or whatnot.

Right? So so so they're extremely sensitive to the context. That was one area where we made significant improvements. And then the other one is it doesn't have to be just the model itself. Humans, our brains did not evolve much over the last twenty thousand years, but you'd probably agree that it's a culture and kind of the things that we can do evolve significantly.

Right? And it comes back to knowledge, culture, and tools. And and tools are extremely important. And so I always thought that, hey. Why would I try to teach LLM to be a compiler when I could teach it to use a compiler?

Right? Like, why would I try to make it, like, like, calculator if I can teach it to to to use a calculator? So tools are extremely important. And so one of the tools that we gave to LLMs was even kind of in that first generation was tool to check the syntax of the code. So if LLM generates incorrect code, we can we can identify that.

Right? And and then we're seeing that the next super important point, which is then you can build the feedback loop, which is extremely important for any intelligence system. Like even if you if you personally just wanna war walk a straight line, super easy task, but if you try to close your eyes, you know, it's not gonna be as easy. Right? So so we always need that that feedback loop, and and that was part of it.

And now if you think about those things that I just described, like like the right contacts, their feedback loop, the tools, that's what today we call agentic systems, right, and what we call agents. So so that's kind of how we started on that first generation. And as we were and we very quickly build product that was able to kind of generate code better better than direct competitors. But as we then kind of try to commercialize that product, kind of build their the actual solution that's nice and polished and great UX and works here and there and everywhere, Anthropic shift cloud 3.5 in October of last year, and that was the first truly agentic model that could successfully use tools and that could explore the environment, engineering environment, could operate with bash commands and whatnot. So and that was our game changer and allowed us and and and our competitors and just the rural market to move to the second generation of products, which is today the the dominant kind of AI use in coding today, which is you have capable agents that work inside of your development environment.

Right? So you're you're still in control. It's still kind of human in the loop, I would even I would not even say human in the loop, it's human in control. But you have very very powerful agents starting with coding agent and then rapidly kind of evolving into all sorts of other related agents. So for example, you could have a call to review agent or this or that. So Mhmm. So that's their, category as it is today.

Jack Bridger

Scaling DevTools is sponsored by WorkOS. If things start going well, some of your customers are gonna start asking for enterprise features. Things like audit trails, SSO, SCIM provisioning, role based access control. These things are hard to build, and you could get stuck spending all your time doing that instead of actually making a great dev tool. That's why WorkOS exists.

They help you with all of those enterprise features, and they're trusted by OpenAI, Vercel, and Perplexity. And if you use them for user management, you get your first million, yes, million monthly active users for free. I honestly don't know any dev tools that have a million monthly active users apart from GitHub maybe. So that'll get you pretty far. Here's what Kyle from Depot has to say about WorkOS.

Kyle (Depot.dev)

We use WorkOS to effectively add all of the SSO and SCIM to Depot. It's single handedly, like, one of the best developer experiences I've ever seen for what is, like, a super painful problem if you were to go and try to roll that yourself. So for us, we can effectively offer SSO and SCIM, and it's like two clicks of a button, and we don't ever have to think about it. It's like one of the best features that we can add to depot. It's super affordable, which effectively allows us to like break the SSO tax joke and essentially say, like, you can have SSO and SCIM as, an add on onto your monthly plan.

Like, it's no problem. So it really allows smaller startups to essentially offer, that enterprise feature without a huge engineering investment behind it. Like, it's literally we can just use a tool behind the scenes and our life is exponentially easier.

Key factors driving AI breakthroughs

Jack Bridger

Yeah. Actually, Andrew, sorry. Just on a question on maybe this is like has an obvious answer and is not interesting. But why was it that 3.5 was like the first one that could do tool calls? Like, do you have any understanding of like why what it

Andrew Filev

Well, it it all the stars pointed there, it was just the the first to to get there. Right? There the stars were pointing there. I think all the labs are working on more gen chip models. I think a big part of the breakthrough, obviously, Frontier Labs didn't stop disclosing what actually happens.

Right? So so you can only guess, and and or you can kind of, have maybe retrovision when open source gets similar results. Right? You look at open source and you're like, oh, maybe maybe the frontier did the same thing. So, I I think there are two, two important things.

So so one is just accumulation of, right data. As you know, there are companies like Scale and Search that made billions of dollars right now on just generating the right data. So they're actual senior engineers that are working and generating what's called an industry trajectories to teach those agents how to do different things. And so I think one, the natural trend of accumulation of the data and the data getting more and more advanced, because as the labs kinda build the foundational data, then they focus on kind of generating the next level of data and the next level. So so they accumulate more data, and more importantly, they accumulate the data at the higher and higher level that teaches the models to do more and more things.

So so that's one component. And the other component that's extremely important is reinforcement learning. And the labs, I think, started to figure out how to build the right kind of training environment where the agent can attempt solving a real engineering problem multiple times. And then you know which solution is correct, and you can feed that information back into training. And and then part of this, those things kind of look small and subtle, but they actually matter a lot.

I think a big part of their reason why the industry progressed so fast in the last twelve months is is is one particular benchmark called SWE bench or an industry called SWE bench. SWE standing for software engineering. And so prior to that benchmark, the models typically reported results on kind of Olympiad style coding. Right? And that's not what happens in the real engineering.

And and for the bystander, it all looks the same. Like, you got code here, you got code there, like, it's all the same potato potato. But but it's not. Well, like, if you if you're an engineer, if you work in there day to day, you know that there's a huge difference between you trying to build a large system that's used by millions of users versus you trying to come up with their more optimal way to sort, an array. Right?

So so so those are very different tasks, and their training and knowledge of models from these tasks is has very little transferability. There's some transferability kind of at the basic level. Think about it as you playing hockey versus you playing basketball. Right? Or, like, there's some transferability of your athleticism.

But at some point, those are very different sports that require very different skills. And so so originally, the models kind of paid attention to those simpler coding tasks. And then at some point when they reached good results there and when it became clear that that's not enough to solve real engineering problems, people came up with these benchmarks, we bench. And originally, the models could only solve about 4% of those tasks. And in that benchmark, it had real world issues.

They they scraped real world open source repositories for documented issues, and they had the golden standard of solution to those issues. And not only that, but they also try to find their solutions that had unit tests. So you could test if their solution could be slightly different, or you could still test if it works or doesn't work. So so so they build that benchmark originally had, I think, more than 2,000 tasks. I don't remember the exact number.

And again, originally, the models back at the day could only solve about 4%. But now, you as you know, if you can measure something, you could improve it. So so there was a target put in front of the frontier providers to to improve, and they started building more data and they started building better reinforcement learning setups to to have models be capable of solving those issues, and there was rapid progress. And so for for example, May, we submitted our agent that that used a mixture model, the primary worker there is was Cloudera 3.7 because four was not available yet, and we were number one result on that Subench mention mark. So Really?

Solving 70% of the issues. And they may imagine that, like, those are real world problems that we could solve 70% of those of those problems. That's that's very impressive. The there there's other similar benchmark, OpenAI scraped freelancer real freelancer jobs that got people paid. And so in that benchmark, it's kind of funny and interesting and catchy because you can measure your success not only in percentage, but in actual dollars.

You could be like, hey. I saw love for for $70,000 out of $200 pool. And so that benchmark, again, at the time, we we had the best results in beating everybody by by 20% of solving kind of higher engineering problems for for from the benchmark. So so Can can

Jack Bridger

you also measure if it's is it profitable to run it?

Andrew Filev

Yeah. That that one is definitely profitable. Yeah. So so when AI works, it is absolutely more cost efficient than human labor. Well, like, I'm I'm I'm obviously proud user of our own product, but I also pay for other products. I, I have their top plan on, Anthropic, and I use it all the time. I have a top plan on OpenAI. And the way I'm thinking it, the more I'm paying, the more I'm saving. It's it's not like Yeah. I wouldn't like to pay pay them $20.

I'm super happy to pay them $200 because then I can offload some tasks to them that would otherwise require to have an executive assistant or somebody to to find all this information and whatever. So so Yeah. That's to to me, again, when when it works, that that's the caveat. Mhmm. When it works, then the more you use it, the the better and it correspondingly, the more the more usage you typically the more you pay for it because those models are pretty expensive to on the compute side.

Right? So so from that perspective, there's no free cheese. But, again, my salary is much more than $200 a month. So so is yours and most most people. So they're the saving is definitely there.

Jack Bridger

Yeah. And sorry I interrupted you as you go ahead. Just to ask me about profitability. But yeah. So sorry.

SWEbench

So you were saying you were able to to really be number one on the Suitebench, as well as to do really well on the one that's like presumably like five Fiverr.

Andrew Filev

So this will answer. Yeah. And then And and there are couple of version of Suitebench. We're number one in Suitebench multimodal as well, which kind of measures your performance in multimodal tasks because part of engineering is not just text. Right?

You you try to implement solutions that, for example, replicate the wireframes that you're given or fixes some visual issue on the website. So that's that's exciting. And then we were I I think rolling back, we were talking about kind of a generation of agents. Right? So so we started with code completion. We continued to kind of those coding agents in IDE, which are all the rage right now. People love them. Some people hate them, but but I think most people like them. And

Jack Bridger

so Carcer, Winsar.

Andrew Filev

Cursor. Yeah. And and there are actually two different audiences, I'd say, for for the agents. And both get get the value from it, but the usage patterns and the value are quite different. So there's what we call in industry wide coders.

Right? So people who kind of build simple projects, typically prototypes, a lot of them are not engineers. They might still be technically inclined people and smart people, but not not engineers don't know how to code. And so they, again, create those simple apps from scratch in the industry. It's called web coding, and that's fun.

And most of the professional engineers occasionally do that as well for one reason or another. So so for example, my kid wanted to build a mobile app, and I haven't touched mobile development in forever. And I just opened Android Studio, installed Zencoder in it, and then using Kotlin, which is a programming language that I never wrote in, created this mobile app for for my kid just to show him what he can do with kind of today's modern AI, or our chief operating officer who who runs go to market. Right? He vibe coded an app for for his friend who needed some simple ecommerce website.

So so that's one use case pretty popular, and that's where today you can see that 10 x from AI where, again, it's to some degree kind of infinite x. Right? Because the person otherwise might not be able to create that app altogether. And then there's a very different setting of professional engineers, and they they're do not yet see that 10x. They already see good good gains, but it's not yet 10x, and there's variety of different reasons for that.

So in professional setting, you typically have either one very large repository for your code that's called Monorepo. Right? So and it's very hard. And and usually, it's it's gnarly. Right?

Like that Monorepo has has some tag that's I had like I I don't know if you've ever been to Silicon Valley. There's a building here called Winchester House that that's very famous, and it was deliberately built by the owner to be very hard to navigate. They're they're they're, like, stairs going nowhere or doors opening into the walls and whatever. So so a lot of those those monorepos, they have those components. So even your past engineers have hard time navigating them, and LLMs with their limited context definitely have trouble there.

So so that's one complication. Another very frequent complication is also instead of one super big repo, you you could have, like, 200 smaller repos and a bunch of microservices, and, again, it's all kind of spread over. And even for humans, hard to to to grow up and for the models even harder. So you got those complexity, lot of proprietary APIs that the models were not trained on. The models also don't have the choice of the library because when people write code, they usually stick to some very modern and frequently used technologies.

So it'd be, you know, Node Node. Js, TypeScript, some AI friendly front end library like ChatCN and whatnot, as opposed to when you're working with existing applications. Some of them people started building five years ago, ten years ago. They're using languages like Java. Java, by the way, those developers like to work of different IDs.

Mhmm. They've not they they work in JetBrains and whatnot. So so there's a lot of that complexity. And right now, you you can still get a lot of benefit from agents, but you need to know how to wrangle them if you will. Right?

So the best the engineers who get the most value, they oftentimes are very good at writing custom AI instructions, understanding how to how to run those agents, how to operate those agents. And so part of what we're doing is kind of helping people cross that barrier. Right? Doing some of that work for them with our repo blocking stuff, or helping them share the knowledge inside of organizations. Say, you're that innovator and early adopter and you came up with your super cool AI instructions and you build your agent.

Can you share that that agent with me so that I don't have to think? Can just click the button and and get the same same benefits. So that's

Jack Bridger

and examples of that might be, like what what kind of examples are you seeing people

Andrew Filev

share? I I I can give you a real world example from from from our company. So Coding Agent is doing better job our product. Right? Let let me just step back a little bit.

Our product, we we we have software agents, and we deliver them on different what I call surfaces. Right? So you can run them here or you can run them there. Very important surface is, developers IDs, and we support the Versus Code for Microsoft, and we support the whole family of JetBrains. So those are different platforms, Versus Code or JetBrains.

And if you run our coding agent on our repository, it will just plain vanilla. It will have better results on Versus code than it will have on JetBrains. And so one of the things that we had to do is we had to figure out, okay, what's what's breaking, and so how could we make sure that we give the right context to our coding agent and the right instructions so the code that it produces for JetBrains is still a good code. There there's no hallucinations. It uses the right APIs.

And then JetBrains is a complicated ecosystem because it's not one ID. There's they they got Yeah. Like a dozen, and there's different versions with sometimes very different APIs, like between 2023 and 2024, 2025. So so you gotta be very specific in, like, what sort of APIs you wanna use and what sort of architectural patterns you you wanna use for that code to be compliant with your guidance. And so so that's an example where, ideally, you do it once and then you share it with your rest the the rest of your team so that they can benefit from it.

Another example is, again, from internal use, but you could see how it applies to to to everybody. We built as part of our offering, we not only build coding agents, we build testing agents as well. We we should talk separately about it. It's very very important topic. But, anyways, though those testing agents, as as you know, if you wanna write automated tests, there's some setup that needs to happen.

Agents in DevOps Pipelines

So for example, if you try to test the billing page, right, like you need to somehow mock the credit card checks or this or that because because it's you should not get it tested with your real real credit card data. So so there's gotta be some some setup that you do in order to to make that agent successful. But once you do that setup, it's it's awesome. Right? You, like, you write a piece of code and then then you call their test and agent and creates a bunch of scenarios and scripts for for you that you can then put your regression suit or small suit or whatnot.

So so so that those are examples of how there's kind of institutional knowledge that you need to first package to and and share with AI so it kind of tries to reach your standards of performance. And then once you've done that, ideally, you also wanna share that with the rest of your organization. So it's not just one person benefiting from from that, but but everybody else. And then one other interesting thing that, by the way, brings us to their next generation of products is what what's happening right now is as people are getting better and better doing that in their IDs, and those agents with human guidance get better and better performing sort of simple tasks, we get to the point where you can now trust them in, again, in certain scenarios, and that means that you can run them autonomously, which means that there's a need for a different surface because IDE is obviously dependent on you coming and kind of hitting the button. Right?

So one of their most natural surfaces for that is your existing CICD process. So every engineering company has their DevOps pipeline, and so in that pipeline, you might have your pre commit or post commit hooks and whatnot. And so it's very logical for for example, if you develop your own code review agent that does a good job of checking your code against your guidelines, your standards. And if you're happy with its results, then you can easily drop it into your CICD, and it will run automatically on on every commit. So this is new offering from us and people love it and it's exciting, and I feel that that's part of where the future is going, where there's gonna be more and more of those agents that run autonomously in your in your pipeline and otherwise.

And what's interesting is that those generations, those they're not necessarily fighting with each other. Meaning, you could use coding agents today, you could use still use code completion. And similarly, you can the third generation, you can use agents in CICD, but at the same time, can still use your coding agents and your ID to work on the more complex problems, and then you can fall back to code completion if you wanna write code code by yourself. So so those are just kind of, again, different capabilities, different surfaces, but they're not contradicting each other, and I could see how all three will become relevant, continue to stay relevant. It's just that the next generation opens up essentially 10 times more use.

For example, if you run-in autonomous agents, you can do some things that are harder to do in ID. So so for example, you could blast five agents. If you have a more complex problem, you could blast five agents. And as long as you can verify their results, you can increase their the rate of solving the problem. Right?

As opposed to kind of typically in ID, you wouldn't run agent five time. It's just a little bit boring and and time consuming to to to do that. So you can do things like that. You can pipeline agents more more efficiently. You can run larger you can script larger kind of migration projects, for example, when you you wanna run certain pipeline arrangements across 10,000 files.

So so just because, again, because it can run-in the background, because you can blast many of them at the same time, I feel the aggregate usage there is gonna be 10 times more than what we have today in IDs.

Jack Bridger

Yeah. That feels like very much like you're getting close to, like, the kind of Devon promise of, like, you just got, like, a software engineer that it it's getting kinda close there. It's it seems like it's more like directed forms of, like, autonomy and maybe,

Andrew Filev

like, the next step is like I'm I'm I'm very, opinionated just because of how they they came to market, and they basically, I think, alienated everybody. Because first of all, they they they promised stuff that that wasn't working. And even today, right, their their agents are are slow and they're not best in class. So so why would you take some take something that that's slower and at the same time not the best solution? And then also, I don't like the positioning.

So I'm I'm not in this business to replace engineers. I'm in this business to help everybody ship more software. Right? So part of the my passion and motivation is I'm an engineer by training by but I'm a product guy by heart, and I always get bunch of ideas. And even in my previous business where at its peak, had about 1,200 people, so several 100 people in r and d department.

I think only about 5% of my ideas were ever implemented there. There's never enough time, never enough engineers, and so and and big chunk of that work, anybody who worked in corporate or enterprise settings knows that huge chunk of the work is not super creative. And it's sometimes it's not even coding. Right? It's just that that whole pipeline takes a while, and then when when you code, it's not you're just working on new features.

A lot of it you're fixing tabbed out, you're doing migrations, you're fixing small bugs, implementing nice, like, these little features, or working on internationalization, working on accessibility, and a lot of other important things. Right? Making sure that the solution is secure and scalable and whatnot. So so there are a lot of those things and a lot of those checks that are important when you're shipping commercial grade enterprise grade software. And and so I feel that AI could help us take care of a lot of that so we could move much faster.

So so that's that's the goal. And again, going back to to Devin, think the positioning was kinda weird. It's like, as an engineer, why would I spend four hours of my day kind of babysitting Devin if I don't get the credit for for the PR. Right? So I could could work in a start up if I'm the owner and the engineer and everybody.

Right? But but in in a commercial setting, like, you gotta give the credit where where it's due, and we're we're here to help. Gears, we're also here to shift left. So I we're I think this is one of the biggest unlocks in the release velocity. So so when I talk to VPs of engineers today Mhmm.

Again, we're not yet seeing that two x increased release velocity that AI could deliver. Well, like, I think today's AI technology for a lot of companies already could deliver that that two x even if we don't get the next generation of models, which we will. So and the question is why? And I feel part of it is again, it's not just just coding. So very frequent scenario is you code something and then you throw it over the wall to QA, and then, you know, they're working on something else, so it takes them several days to get to it.

They check it and, bum, they find some obvious thing that was your job to find, but for whatever reason you didn't. They throw it back to you. You you're already working on something else, so you gotta wrap that up, get back to this thing, get distracted, fix it, and so all of that accumulates in sort of context switching cost and more importantly in this calendar delays as opposed to you write the code or coding agent writes your code and then tests are created for you by a different agent, right, and catch some of those more obvious issues, then it can save a lot of cycles for for everybody and you can kinda stay in the zone, you could focus on important things and get the agents can can help you do the stuff that you don't like doing as much. Good. And then there's still I feel there's still important roles for people in all of those stages because if you for for example, if you're generating more code, right, then then architecture becomes more important and some somebody has to think about it.

And today, yeah, it might change with the future generation of model, but today I wouldn't trust hard complex architectural decisions completely on autopilot to the models. Like you could use model as a sparring partner, right, as as your collaborator. You could sort of ask it to generate some ideas or ask it to validate some of your ideas, but you gotta be making those decisions. Right? Or Yeah.

You mentioned QD. Right? Like you you or if there's 10 times more code, 10 times more code needs to be tested. So, yes, AI has gotta be there, but at the same time, who's gonna create the frameworks for for the test? Right? Who's gonna make sure that it all works well? So so there's, I feel the more and more AI can do for us, we actually got more and more sophisticated job that we have to do to kind of orchestrate that that that AI.

Jack Bridger

Yeah. Yeah. That I think that that makes sense, and that's a future that I definitely welcome that we're still gonna be of use. One of the questions I really wanted to ask you and maybe why this is the slightly more technical episode or, like, slightly more, like, in-depth episode in terms of, like, the actual, like, technology is that I think that it's really hard to, like, not to think about how, like, developer tools evolve without thinking about how, like, code how how code is written and how people work and stuff. And so I wondered if you have, like, kind of any views on, you know, if someone's building, you know, like a feature flagging API, for instance, how they should be thinking about, you know, how this is evolving, how engineers are working Mhmm.

The Evolution of Developer Tools

And what it means to kind of build a dev tool.

Andrew Filev

Great question. So couple of points. First, let me build off their of the previous conversation. Right? I I actually recently had this this discussion with my team, which is kind of interesting insight. But as we go deeper and deeper into this rabbit hole, engineers are gonna do more testing, and testers are gonna do more engineering. It's just k. K. K. So it's because, again, testers' job is now gonna be more complex.

It's it's not it's not it's not like they had gonna have less work. They're gonna have work, and that work is gonna be more advanced. Right? They're they're gonna need to to create those frameworks and orchestrate those agents and whatnot. And then engineers, again, if if their routine stuff is taken away, then part of their like, again, there's 10 times more code.

They gotta they gotta look at what's been generated and then see see that it actually what what needed to be built. Right? So and, of course, there there's gonna be more architecture work, but that's less controversy. Right? Less less money.

But the fact that, again, the kind of, a little bit of cross pollination is is is an interesting one. Then, in terms of, again, their evolution of, of profession, And and you you brought up the interesting example of building feature flagging API. Again, just interesting bit. If you're working on something that's not core to your business, so so it does not add differentiation to your company, does not help you position against competition, I think there's more and more incentives for people to consider open source, not just as consumers of open source because that that's no brainer. Right?

Like, if there's an open source library that does feature flagging for you, just just use it. Right? Why would you why would you not? But also to contribute into open source because if you do that, then the next version of the model I'm not even saying next generation. Just just the next drop of the model is gonna pick your open source code and your open source documentation.

And so you, for free, are getting there something that's very hard to get. You're you're getting frontier models trained, on your specific piece of code. So so I think that that and the fact that there's just so much more software reading and that there's so many more software companies. Like previously, when when I started in SaaS industry, there was less players. And so a lot of this well, like, we were building our own SSO, for example, right, in our own SAML.

And if an if I outsourced it, I I also knew that my competitor might might pick it up. Right? And and and it might have been a different share. Like, I might might have been the first company to ship SSO in my category. As opposed to today, I mean, all all B2B SaaS has SSO one way or another.

Right? So so there's no reason to keep it proprietary. Like, I assume all good software shop will have feature flagging. So so there's more and more incentive to contribute to to open source. And as I said, yeah, you get in there the free benefit of training frontier models, with no GPUs required from from you.

Jack Bridger

I I should say that Podcast is sponsored by WorkOS. I think, like, if people are for single sign on than Savile, they should just go use WorkOS, though. Right? Yeah.

Andrew Filev

Yeah. Yeah. No. I I yes. Yes. Exact yeah. Yeah. Yes. Let let's stick to feature flagging. But but feature flagging is a simpler component. Well, like, they would do it. It it In my defense, there's a a SAML on SSO. It was a pretty heavy piece. Like, we in my my previous business, like, we coded it. We supported it.

The same goes for integrations. Right? Well, like, you everybody used to write their own integration to, Google Suite, right, or to SharePoint and whatnot. Right? And so, like, there there there there's no reason to do that. Yeah. And then there

Jack Bridger

That's an interesting point, I think I think also, maybe, like, the internals, like, I guess, like because every DevTool, if they're open source or not, their documentation's gonna be, like, open source. They're gonna they're gonna put that out there a lot. But maybe once these things get really advanced, the fact they know how the actual insight, not just the surface area of the API, but, like, they know they can see the raw code. Like, they know how it works. It might be, like yeah.

Be able to, like, preempt why things are not working and stuff like that. It'll be like

Andrew Filev

Well, just give me an interesting idea. So, it could be it it doesn't exist today. Right? But what if you could publish your internal API docs to LLMs, but in a way that would not get them picked up by Google. Right?

Because right now, they obviously work off the same index. And so if you do that, you're kind of exposing your your internal APIs again to your competitors, which could learn from from that. Mhmm. But what if we could expose it to LLMs with without without it being indexed by Google? But but but it's an interesting I I also think is the deeper we go, into AI world, the more just the rare speed is an advantage as opposed to kind of, hey.

Here's how I've done things or, like, I've implemented it six months ago. Well, guess what? And say six months in AI, age is is a is a lot of time. So and and if the competitor learns from what you built six months ago, maybe that's a good thing for you because you're already building it in a very different way today. Right?

So think there's less and less of that proprietorness and more and more value in just speed, understanding of your customers, understanding of the market, understanding how to, build with AI into your product offering and how to leverage, agent agents into whatever you're offering to your customers. Yes. And, again, less and less in the proprietor. And then going back to your your kind of question about evolution of profession, one other interesting kind of funny semi controversial insight That's good. Is if you for for for like, I noticed that, you know, you people hire senior engineers.

Right? And they do it for the reason. Though those engineers are very skilled, experienced, and have bring a lot of value to the table. And then those engineers sometimes try agents. An agent fails, and that engineer does what he's paid to do.

He comes and manually solves the problem. Awesome. And then you compare it with a vibe coder, who does not have that experience, might not even know the programming language. And and when AI agent fails, vibe coder good vibe coder with the technical chops and engineer mindset, he will try to solve it with AI agents. So so he will write some AI instructions, play with them, or whatever.

And so if you fast forward that, like, a week or two weeks, right, there the senior engineer will have the same, level of AI, capabilities, and the wife quarter will have much more advanced harness and tooling and prompts and whatever. And so he he will actually be a better AI engineer, if you will. And and and I in engineering, the true sense of the word, which I respect, like somebody who creates, builds, and somebody who whenever you create and build, you always face problems, and engineer is the person who solves the problem. Right? So so from from that perspective, that wife coder, again, could could do something that that that's amazing.

And and I've seen this, also work, with senior engineers as well. For example, one of their, in one of the companies, they're working on a very large migration, and they absolutely have not enough humans to manage that migration. So so they cannot do it manually. Well, like, there's a constraint. Right?

And as you know, constraints bring, breed ingenuity. And so all they can do is work on prompts for agents to make sure that the agents can, can do that. And, again, they they have one of the most interesting and sophisticated setups that that I saw ever saw was kind of like, hierarchical prompt templates and whatnot, and then that pick up from environment variables, and, there's a workflow on top of that as well with multiple different agents, and then there's, like, execution of that workflow. So so, that that's an something for senior engineers to to think about, if they wanna stay competitive in this market and if they wanna if they have that internal drive to be able to build faster and better and and and be the best in the area. Well, like, maybe sometimes they need to create those artificial constraints and try to try to do something without using their prior knowledge.

And then they're they're unstoppable because then they get the best of both worlds. Right? That they get their AI chops and they get their, like, ten years of experience in context and and whatnot.

Jack Bridger

Wow. Style vibe coding. That's what I'm I'm taking away. Amazing. Andrew, that was really, really awesome. If people wanna learn more about Zencoder, where should they go? Zencoder.ai. Awesome. You, any shout outs to make any plugs?

Andrew Filev

Well oh, sorry. That that that is an awkward pause. That probably better better cut out. But

Jack Bridger

We maybe we should just keep it because it's more fun now. Yeah. Because But You're not here to plug. You didn't have to plug instantly ready to go. So Yeah. There's more

Andrew Filev

interesting It's it's it's it's exciting space. I I think all engineers should be playing with coding agents today. They don't know it yet, but I think they also should be playing with testing agents. So we'd love them to to try our internal testing capabilities. And then I think it's also a good time for the ones who wanna stay ahead of the curve to to start playing with drop in agents in CICD and and figuring out good good good good use for that.

And then for for everybody, I think it's time to broaden the perspective from coding to software development life cycle. Right? So so it it is a process. And and, again, unless you're solo vibe coder, if if you're working in real organization on a corporate enterprise level, it's it's a process, multiple different step, multiple different tools, something multiple different teams, and a lot of time could be saved kind of in optimizing things across the process rather than just just coding. But coding is is the the most fun part and kind of that sits at the core of it.

Better prompts with Claude

Amazing.

Jack Bridger

Also, thank you very much to Amy from Anthropic for making this episode happen And that it's coinciding with the launch of the enterprise well, Andrew, maybe you can share what it is because you're part of that.

Andrew Filev

Yeah. The and they're launching the enterprise partner directory, which is exciting and it comes on the heels of their recent Cloud four launch, which is amazing set of models. Sign is a good good kind of workhorse for daily tasks, especially in software engineering, and then Oppos is our kind of a bigger thinker model. And I I like using both, and I also personally use their research mode quite often for different kind of executive assistant type tasks that I I don't have time to to to do myself.

Jack Bridger

Yeah. I I actually used it for making this, like, prompt for I was trying to do some stuff with Eleven Labs, the voice stuff, and it it generated this amazing prompt. So, like, when and researched lots of people and it was really good.

Andrew Filev

Yeah. By by the way, any of your listeners are not doing it yet, that that's one of their best things to do. Like, if you're if you're facing a complex problem and AI isn't working really well for you, try using AI to help you generate more extensive prompt. And you can kind of guide and say, hey. Here's what I'm trying to accomplish.

Here's where where I'm failing. Could you help me generate this mega prompt that that that will alleviate this this issue in the future? And if especially if you give it access to some tools. For for example, I've recently done it with a piece of our code and I gave an agent access to code files so it could extract valuable information, But you could, of course, use different different tools, different connections. Like, if you're working on emails, you could probably give give give it an FCP to access your Gmail or something.

Right? Working on calendar, you get access to calendar and kind of guide it, and it can look up the past information, pick up some patterns and create a good prompt that you can then copy paste or create Zen agents and and and reuse in the future.

Jack Bridger

Yeah. It's a great tip. And also, love the phrase mega prompts by the way. I feel like that should be common common parlance. Okay. Well, that's it. We'll say a second goodbye. Thanks, everyone, for listening.

Andrew Filev

Thank you.

Jack Bridger

Thanks, Andrew.

Andrew Filev

Thanks for hosting me, Jack.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast