Measuring and Improving Developer Experience 📊 — with Abi Noda - podcast episode cover

Measuring and Improving Developer Experience 📊 — with Abi Noda

Apr 18, 202543 minSeason 4Ep. 1
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Summary

Abi Noda, CEO of DX, joins Luca Rossi to discuss the complexities of measuring developer productivity and experience. They explore the common misinterpretations of the Accelerate book's Dora metrics, emphasizing their role as diagnostic outcomes rather than direct targets for improvement. The conversation highlights Developer Experience itself as a crucial input for driving productivity, identifying key factors like focus time and knowledge management. Finally, they analyze the current state of AI in development, contrasting social media hype with real-world impact and potential beyond just coding.

Episode description

Today's guest is Abi Noda, the CEO and founder of DX, one of the leading engineering intelligence platforms.With Abi, we talked about measuring developer experience. We started with the early days of Accelerate and why we feel like most people got the book wrong. And then we continued to present days and how research focuses on driving great developer experience. And finally, we couldn't avoid talking about AI and why it seems to be a game changer for entrepreneurs, but not so much for teams yet.01:23 Introduction02:45 Abi's journey in tech08:19 The four key metrics10:41 Metrics' reliability13:41 Diagnostic metrics16:06 A metric analogy18:23 Find productivy metrics drive22:03 What makes a developer experience good?29:44 The importance of comparison31:53 Common issues in developer experience34:55 Are meetings bad?36:16 AI in development process—This episode is brought to you by https://sleuth.io—You can also find this at:• 📬 Newsletter: https://refactoring.fm• 🎧 Spotify: https://open.spotify.com/show/7Luds9dmzZDoDC8Q7EMbSw• 📱 Apple: https://podcasts.apple.com/us/podcast/refactoring-podcast/id1719137305—For inquiries about sponsoring the podcast, or appearing as a guest, email: [email protected]

Transcript

Introduction and Guest Background

So I think that's something we still see so much today is whenever you get into the metrics conversation, you kind of get pigeonholed. just obsessing about the metrics and not thinking about, well, how do these metrics actually tie into the organization? Like what is the.

Will Larson, I think someone both of us look up to you, he had this great phrase, theory of improvement. Like, so what is the theory of improvement around how do you apply these metrics? How do you actually improve the organization system? And so, yeah, it's it's. it's a continuous problem hey luca here welcome to a new episode of the refactoring podcast where every two weeks we interview a world-class tech leader

Today's guest is Abi Noda, the CEO and founder of The X, one of the leading engineering intelligence platforms. With Abi, we talked about measuring developer experience. We started with the early days of Accelerate and... why we feel like most people got the book wrong. And then we continue to present days and how research focuses on driving great developer experience.

And finally we couldn't avoid talking about AI and why it seems to be a game changer for solopreneurs but not so much for teams yet. So let's dive right into it! Hey, Abby, welcome and thank you so much for being on the show today. Thanks so much for having me, Luca. Great to be here.

So you are this CEO and founder of DX, which is a software engineering intelligence platform. And the reason why we are having this chat is that I believe DX constantly publishes some of the most interesting research. and frameworks about developer experience and developer productivity. So first of all, thank you so much for that. Yeah, great to hear.

The Pullpanda Story and Early Metrics

I want to start with a funny story actually that I discovered just recently because for years I've said to everyone that you can start measuring developer process things with just very small things and using as an example the fact that my very first experience in measuring these things was with pull requests many years ago using a tiny tool called Pullpanda. And only after many years, I discovered that you were the founder of Pullpanda.

That's fantastic. So I would just ask you to rewind the tape a little bit and to break the ice, just tell me more about your journey in tech and how you ended up working in developer productivity.

yeah uh well first off i've given a few lengthy talks on the the pole panda journey that are public so for anyone interested i would recommend checking those out in the show notes yeah um but i you know i'm a software developer have always been a software developer and about eight years ago i transitioned into engineering management And in one of my first jobs, I actually became the CTO of a startup. And about one week on the job, the CEO came up to me and said,

All the other departments, sales team, marketing team, they're reporting metrics each month at the leadership meeting. Can you start reporting on the productivity of engineering as well? I thought, wow, that seems like a pretty reasonable question and started reaching out to mentors, looking up online, like what are the metrics you should be using and then quickly realized how there really wasn't a good answer in this space.

uh luca i always joke with people i'm still trying to answer that question eight years later i'm still trying to answer that question for my ceo but you know along the way pull panda was a company i started actually really on the side while I was working at that job. And it wasn't actually for the metrics problem. As an EM, I just found myself having to nudge my team to get their pull requests over the line constantly. And I thought...

you know, if I could just automate that with like a Slack bot or Zapier or something like that, it could be valuable. And so yes, I ended up building an MVP. I actually got fired from that job. And so when I got fired, I started working on this project that I thought at least it'd be something to kill the time with. And it just so happened that it really struck a chord and kind of took off.

And so we have the kind of pull request reminder product and then I later added pull analytics, which was the pull request metrics and analytics, which is really getting back to that problem. that i had become fascinated with of you know how do you what are the metrics you can actually use in engineering to to drive productivity yeah i love the story and i think i mean looking back one of the things that i loved

about the tool is that it combined both the quantitative sides of numbers but it was also very actionable so it give gave you as you said the nudges on slack and so it actually made an effort to drive the right behavior in the team yeah and so it was a good early example super early i should say of not just giving you like the numbers in the dashboard, but actually doing the practical things that really helped. Yeah, no, I think.

still to this day at dx we're thinking about the same problems and I often do come back to just how effective and simple and pull Panda was in terms of combining the data, but also realizing that dashboards aren't really that valuable because most people don't have time or interest in going and looking at dashboards. And so it was really the alerts that was really the sticky product with Pullpantha.

Hey, before we continue, I want to spend one minute to tell you more about our podcast sponsor for this season, which is Sleuth.

Because chances are you know of engineering metrics programs that did not get adopted or failed to deliver the promised productivity gains. But why? Because oftentimes these metrics live in dashboards and dashboards die loneliness they were built for a purpose but then people just forget about them so sleuth pulse solves this problem by bringing metrics and people together empowering engineering reviews

So you can easily use Matrix in sprint reviews, monthly check-ins or resource planning to get better alignment and better execution. This is truly the best way to make Matrix a part of your workflow. and Sleuth improves these reviews by bringing in quantitative and qualitative data, letting people add context and using AI to summarize data and pick out outliers. So check out Sleuth.io for more information.

Rethinking Accelerate and Dora Metrics

Yeah, yeah, it's true. So moving closer to present day, we have already written an article together about all the shifts that are happening in how developer experience and productivity gets measured these days. And I got to think... more about this lately because In the community, we have a book club. Every two months, we vote on a book to read. We read and then review it together at the end of the period. And the last book we picked was Accelerate, which I had already read, but...

Back then, I think it was 2019 or something like that. And I read it again in the light of everything I know today that I didn't know back then. And one of the things that struck me the most, and I thought about you and I wanted to ask you this, is that I feel like the perceived legacy of Accelerate today is a lot about Dora metrics. But the book when you go through it is actually mostly about culture and engineering practices. It's really like 90% a qualitative book.

uh if you can call it like that and the quantitative part is is really tiny so do you feel that way as well what do you think about this yeah there's uh kind of a running joke and i would say kind of the og like the insiders you know people who've been around accelerated dora i.t rib offs kind of community the running joke is that you know when that book got published everyone just

uh jumped skipped a page i think it's like 19 or 20. you know they just read the page with the four metrics and then put down the rest of the book so yes uh and you know what people mean by that joke is exactly what you brought up right uh there's Obviously, in the book, there's a definition of these four key metrics that have statistical power behind them. But really, the book is about, you know, how do you transform an organization?

best practices that organizations should be focused on at that time in the industry. And the metrics are just a way of sort of demonstrating and verifying the value of adopting those practices. So I think that's something we still see so much today is whenever you get into the metrics conversation. you kind of get pigeonholed and just obsessing about the metrics and not thinking about, well, how do these metrics actually tie into the organization? Like, what is the...

Uh, Will Larson, uh, I think someone, both of us look up to you. He had this great phrase theory of improvement. Like, so what is the theory of improvement around how do you apply these metrics? How do you actually improve the organization and system? And so, yeah, it's, it's.

It's a continuous problem, but certainly something that happened with Dora. Yeah, absolutely. And I think one of the reasons why they're really stuck in people's minds is that... it was possibly the first time where somebody attached like real numbers with benchmarks from the industry from many thousands of companies to something that had not been measured at all

before so maybe you know when it comes to the the practices the the culture things like generative culture trunk based development all the things that the book mentions it was not the first time that many people had learned about that But the four metrics and the numbers associated with elite teams, good performance, that was generally new for many, many people. I think it was maybe that. Well, you're absolutely correct, actually. I think I'm...

Till Accelerate, they're really the only discussion, meaningful discourse was around like whether or not it was good to measure lines of code, whether or not it was good to measure story point velocity. accelerate was the first time anyone really came out and said hey those aren't good ideas and here's a set of metrics that really are good it was really the first time and i remember that because that book came out

while I was working on full analytics. And I was really stuck. I was really stuck trying to figure out what are good metrics. I had lists of every single metric you could possibly possibly think of. and i have a funny story about when i i was literally reading accelerate and uh was talking so my dad was retired software developer hated metrics a classic developer hates metrics and

He came home one day while I was staying at the house and I told him, hey, look at this book. Look at these metrics. I think this is the answer. And I remember he was like, what because like why are those the right metrics because someone said they are you know some person in the book just said these are the metrics yeah yeah but like yeah there's a lot of

Validity to actually that counter argument now thinking, you know, looking back and knowing that, you know, there are a lot of flaws in the door of metrics and a lot of organizations have sort of hit dead ends with them. um you know it's funny to think about but you know that same argument could be made for any of the stuff we're putting out at dx too right oh like why why is that the right way to do it because someone said so so just kind of a funny story

Yeah. And I think, uh, I mean, I think the same stories, uh, happened today. I mean, the goalpost keeps, keeps moving, but you always have like skeptics and, and people who are instead big believers, early adopters and stuff. And I think. there has been a transition because at the beginning, as you said, there was probably the difference was about metrics that were, they were genuinely bad, like lines of code. Okay. Or maybe even velocity probably at the company level.

versus medics that feel good, like their medics are good medics versus bad medics. And then I think the whole conversation like more recently got more nuanced and we started saying, okay. They are good metrics in isolation, but good for what? They are not good for everything. Right. And, and so we get more to present days and we talked about this in the past that you, you, you actually taught me that.

Diagnostic vs Input Metrics for Improvement

The organics are most good for diagnosis, but not for using them directly to improve things in your team. So can you explain better this shift? Yeah. Diagnostic versus improvement metrics is one kind of mental model that I think is helpful for this particular topic. I think another way to think about it is input versus output. metrics or as amazon calls it controllable input metrics versus output metrics actually if you kind of do some googling and try to find something nicole forsgren's own

writing and speaking about the Dora metrics, you'll actually hear her say that the Dora metrics shouldn't be targets, that metrics optimization of the four key metrics is not the goal. And to your point about accelerate the book, that the DORA metrics are really meant to be the outcomes that are driven from adoption and implementation of the capabilities.

that are described in the book. And so when you're thinking about an organization and, you know, how did the metrics fit in the, the anti pattern is to say, okay, you know, let's get the Dora metrics. Let's make everyone look at dashboards with the Dora metrics. Let's tell everyone that they need to improve the Dora metrics. That is not how the Dora metrics are intended to be. They're really an outcome metric. And the input is supposed to be.

actually adopting those best practices and improving the system. And then you hope to see that lift in the metrics as an outcome of doing that. But the metrics themselves are not. focus up i think that's where a lot of organizations take a wrong turn and you hear a lot of stories about organizations who went down that path and then it didn't work out right yeah yeah i i completely agree i only i think that

It's counterintuitive sometimes, right? Because we are saying these are good numbers to track and they give you a good picture. a case of dora of your delivery process and many qualities and then it only comes natural you know to say okay but if you're giving me a benchmark about elite performance on this and that then i want to get there You know, I want to set a target of getting there.

And even if like we all feel smart by saying there is good outlaw, there are all these laws, you know, funny names, but it doesn't always feel natural to say yes, but you cannot really because it doesn't work. So I think that's what comes hard for many people. And what's, I think a good analogy is, I think this is especially a difficult problem for organizations because organizations are complex systems.

a lot of different people. To use an analogy, you know, a common diagnostic metric for us as individuals for health are things like blood glucose, cholesterol levels. your weight, uh, blood pressure. And those are really outcome output metrics. They are diagnostic metrics. Now let's say you want to improve your blood glucose, right?

But the first question is, okay, like, what do I, you're not just going to like suck glucose out of your blood, right? Like you can't just directly, like you can't actually control that metric directly. You actually have to, okay, what I need to do is.

you know eat this much protein every day or eat only this much sugar per day and those are your inputs and and that just breaking down the problem like that the inputs versus outputs that's what is often missing in organizations they they just get the output metrics and say hey everyone you got to improve these metrics and it's unclear what people are actually supposed to do and so

People just start distorting the numbers, for example, is one side effect of doing that. And so that's why it's so important in organizations to separate the diagnostic output metrics from the tactical. yeah input metrics and really make that link clear and get the organization focusing on the right inputs to actually drive improvements to the system yeah

Yeah, I love the analogy and I'm sure somebody listening has written that down already to tell their CEO in the next meeting. I think it's perfect. But so we are saying.

Developer Experience as a Key Input

we have to improve the practices first, but this way it's like we are back to square one. Like we have found the metrics, but you cannot use them. So what can, what should you use?

to drive the actual improvements in a way that you can control what you're measuring and it is not toxic. It reflects the quality, how your team actually works. Yeah. I mean, this is... a hard problem right i i think you know so one thing so what dora did i think works meaning one way to approach the problem is you say here are the output metrics like here here's how we're sort of measuring success overall

And then here are the things we expect or want teams to do. And the things you want the team to do could... literally be binary like are you doing them or are you not like are you doing chunk based development or are you not are you for doors like are you using version control or are you not right um so i think an input metric can

literally be adherence to you know binary yes or no have you implemented a best practice right that that can be input metric uh i think slas tend to be pretty good input metrics so when you're looking at things like door lead time or really any engineering productivity metric, right? There's a lot of wait time in the system, whether that's CI wait time, code review wait time.

and those are good inputs right i don't think those are stumbling blocks for most teams so setting sla metrics around those measuring against that um yeah i mean but but you can really keep digging further and further down the hole you can get all the way down to you know individual performance being a factor that ultimately drives some of these productivity metrics so it's a little bit of an unbounded problem at dx what we're really focused on

as part of this whole picture is developer experience itself as a controllable input to the output right i mean dx and all the research we do all the writing we do is founded on this thesis this belief that the thing that actually drives productivity is developer experience it's removing friction in the development process both technical and social

And, you know, I think we're right about that. I wasn't always sure. So at DX, we're really focused on using actually qualitative signals from the developers. and getting organizations to focus on that. Hey, if you focus on identifying and removing the impediments and blockers and friction. in the developer's way, then the output of higher PR throughput, faster lead times, all the things organizations want to see should occur, right? That's

our theory of improvement to use Will Larson's term. I absolutely love this angle and I went through your framework that the Xcode for when you when we explained this and you talked about developer experience and one of the things that I love about this is that We are taking all these complex systems, computers, outputs and production and at the end of the day we are saying that what really drives quality and improvement is the quality of the human experience.

that actually does the job which is beautiful if you think about it as for we as humans

Defining and Measuring Good DX

that they work with computers and and one and one of the things that i wanted to ask you is so you work with dx with some of the best engineering teams in the world so like think of dropbox versell intercom and more so developer experiences. hard to define i mean hard to to capture when it comes to what makes for a good or bad developer experience sometimes so what does good dx look like for you so what makes you say look at that team

they have a great developer experience. Yeah. So I'll tell you how we approach the problem. Early in DX's journey, we did a qualitative study to really understand what is developer experience. And what does it consist of? And what we found is in that paper we identified, I think it was about 30 socio-technical factors, but that was really a distillation of a much longer.

It's kind of like SEO, right? There's like a bajillion keywords, but you kind of like condense it into the main groups. And so really there's an unbounded number of things that affect developer experience. So one of the key problems here is...

what are the things that matter most? Because we can't measure everything. We can't focus on everything. And so DX, we've spent years continuing to iterate on that question of what matters most. We get at that by... looking at statistical data so like what parts of the developer experience link statistically to the outcomes like that we've talked about you know faster velocity better retention better productivity

We also look at just the feedback we're getting from developers and the organizations on where they see the most important areas and bottlenecks. So it's an iterative and evolving. process. And as the nature of software development changes, like it is now a little bit with AI, the factors will also change. So it's a moving target. So at DX, we've distilled it down to about 14.

key drivers, as we call it, of developer experience. And these are everything from the ability to not have interruptions and be able to stay focused, have deep work, to things like CI, code review. Uh, ability to actually like understand the code you're working with, uh, feedback loops in your development process. Like when you make a change, can you actually verify it, but that it works quickly.

Um, so there's about 14 different factors. And then what we do is we roll that up into a single score and that's what the developer experience indexes. And so when we think about what is good or what is bad, we're, we're really looking at. both that overall composite score, the developer experience index, but we're also looking at those individual 14 dimensions and looking at those scores and comparing across organizations. And it's really interesting problem because.

This is using self-reported data. And so sometimes comparisons across organizations can be challenging. For example, we found that certain areas of... uh you know asia for example there's just kind of cultural tendencies to not be as harsh about how the uh you know your own developer experience and so but that's true actually not just in

engineering, but we found that that holds true in like all social sciences. So it's an interesting problem. But anyways, I'm rambling a little bit uh but to your original question hopefully i covered kind of how we how we approach absolutely i love this approach it makes me think uh i think it was many years uh before uh one maybe

Benchmarking Developer Experience Data

two or three years before I wrote an article about engagement, uh, how to make people more engaged at work. And I remember discovering the Gallup engagement survey. Um,

Yeah, exactly. And I didn't know about that. And I remember, I mean, for those who don't know, this is a survey that has been running for many, many years. And so they have a lot of data and they run these 12 questions that... uh identify the level of engagement and happiness and um from employees in their company and what i loved about that is that they take a subject that is that is so like ethereal and hard to understand and capture and define and by using

data across so many respondents and so many years they actually get to um values that are very reliable and very practical and you have benchmarks and you can really point your finger you know to things that are going uh right and wrong so when i learned about the x i about your index i immediately thought about about that and i think uh it's a great approach so you you provide benchmarks about the different levels of

of developer experience from different companies. Yeah, we typically will do, you know, median top quartile P 90th percentile benchmark segments and then We'll always sort of break the benchmark segments down by size of company, industry segment, uh, location in the world. And yeah, the benchmarks are so.

important. The example I always give is, I mean, suppose you're measuring something like developer sentiment toward tech debt, right? Yeah. You know, that is universally low, right? Universally, developers are frustrated with the amount of tech debt that they deal with.

So that data point by itself is not really actionable or meaningful. It's only when you take that data point and compare it to the rest of the industry that you can really understand, oh, this is bad, but it's actually just as bad as everyone else. So it's not. necessarily a problem, right? So benchmarking is really, really important. And I think I totally agree. And I also think that's what actually helps the most engineering leaders advocate.

for some action for some area of the developer process or developer experience because it's one thing to get into like a leadership meeting and say this is the value of our as you said sentiment and technical depth people around you might not understand what this means either it's good or bad another thing is say we are in the bottom like 25 quartile And we have to do something about that. People are more likely to understand that. Absolutely. Everyone wants to win. No one wants to be last.

No one wants to be behind their competitors. So benchmarks are a great social tool for driving. you know, buy-in and investment in the things that you want to drive change in. And I think that raises just an important point about metrics in general. This metrics conversation, there's so many different angles to it, but one of the...

purposes of metrics is, is not even to measure things per se, or it's not even to optimize things. Sometimes the goal of metrics is to have some data so that you can go make an argument. to the business about something. And I think Dora metrics, actually, that's one of the main functions it has served despite its flaws and limitations of what it actually measures and how it actually.

how relevant it is to teams. It has been useful to a lot of organizations merely as a way of talking about investments and using the benchmarks and asking for more investment in the things folks care about. Yeah, I agree. And one thing I was thinking while you were talking about this is that we always say it's hard to compare yourself to others. Each team is different. So are there any caveats and factors that... you have found really make a difference

for and so you should only like compare yourself with some comparables in that for that matter. So you mentioned, for example, Asian companies versus, I don't know, US companies, but maybe we can say big companies versus small companies or. product stage or industry, have you found that this value has substantially changed based on some factors? The more you can compare like to like, the better, right? It's a sliding scale. It's not black and white, but...

To give you some examples, like mobile engineering. Mobile engineering has a different set of tools. It has typically a different type of deployment cadence than, say, web development. So that's a good example of where you're comparing mobile engineering to non-mobile. There's some discrepancies there that can only be reconciled by focusing the comparison to other mobile engineers.

We see similar things with even things like platform or SRE roles. So like not non-standard engineering roles that are still part of the R&D organization. So broadly speaking, I think. Yes, there are discrepancies. Whenever you change a variable, there are patterns and discrepancies. So the important thing is, to the utmost extent possible, always try to compare.

like to like and it's not always easy to do because it takes a lot of time to get these like very tailored data segments and benchmarks but that's that's what we strive to do and i think that that would be the recommendation for anyone even internally trying to benchmark teams against one another yeah i love this i mean

Common Challenges in Developer Experience

I could ask like 100 questions drilling down this data because that's what I would do like all day if I had all the raw data. So I also wanted to ask when it comes to

you mentioned some practices like having enough focus time doing PR sometime and many, many, many working in small batches and something like that. So have you found that when it comes to offenders and bad practices that affect developer experience the bad stuff is like evenly distributed and but or there are some some bad stuff that is is more recurring

And for the good teams, it's like the secret sauce that the good teams have really figured out how to do this well. And that sets the difference with the others. Yeah, I can talk about a couple of things. The first is decision making. So speed of decision-making. I was talking to a CTO of a prominent company a few months ago, and I remember him saying, you know, yeah, you can optimize your CICD, but...

The real problem is that it takes three weeks to make a decision. And so that's actually a new factor we've recently introduced, and we're seeing a lot of statistical power in that. developer experience driver um you know predicting the velocity and high performance two things that i personally am really interested in that i see as a pattern in the data are knowledge management

So you could call it documentation. It's not really documentation. It's just information. It's the difficulty of getting information, finding information, as well as focus time.

so the ability to be focused that work those two are tend to be recurrent patterns like recurrent top problem areas for organizations and what's really interesting is there's no clear solution from my perspective um so for example documentation a lot of organizations think oh like what we need is to consolidate our wiki onto a new better platform but it's not really a problem with needing a new wiki

It's really like an information creation problem, right? Like there's just knowledge that is lost. It is not captured anywhere and no AI search tool is going to solve that for you. Right. Um, and focus time is interesting because.

organizations think oh you know what we need is no meeting wednesdays or focus time weeks yeah but that's not it's not even really meetings that's the problem it's it's more just the chatter the interruptions the support the like the the daily uh the beating drum of work and so both those problems are things we're really digging into in our research thinking about

You know, what can we do to help organizations more? They're really, I think, big and perplexing problems and software development and really all knowledge work.

Yeah, I love this angle and it totally resonates with my experience. When you mentioned meetings, I think there's been this big discussion around removing meetings as if... meetings are useless per se but I mean I don't know about your experience but in my experience I've rarely found a meeting that was obviously useless I mean if it's there it's for a reason so you might say that

in an optimum and in ideal world you may not need that if you had this and that in place but you have to get there so the fact that you had that you need a meeting is like downstream the fact that you have to do a number of things first so in a way it's like the discussion about the dramatics it's not like you can optimize things by removing meetings it's not that kind of action that you have to take but you have to look before

Yeah, you have to look deeper, right? I mean, some meetings can actually be deep work. You can have really focused problem-solving meetings, pair programming, right? That's deep work. So yeah, simply... Canceling meetings is a very superficial attempt at solving that problem. But I don't think as an industry, we really know how to solve that problem at a deeper level right now. So I think that's one of the things that's really interesting about it.

AI in Development: Hype vs Reality

Yeah, absolutely. So I want to segue into another topic because these days no conversation can be like complete without mentioning AI. Uh, so since you work with numbers and many teams, uh, I will tell you, uh, I'm perplexed.

sometimes by this apparently like big divide that you see between the incredible stories that you see on social media on X on LinkedIn of people vibe coding incredible things And then the fact that when you speak with like real world teams, larger organizations, the apparent impact of AI is still like tiny. And that's not just like anecdotal evidence, even if you look at many reports like the latest Dora report.

Apparently the impact of AI is very small. So do you work on this like with the teams that you're following on DX and what's your take on where we're at with AI in the development process? Yeah. So this is a big focus area for us, helping organizations try to get at measuring what is the impact AI is having? What is the adoption level of AI amongst developers? How are they using it? How can we help them use it better? It's very early days. And what we see in the data right now is...

Yeah, I would say kind of on average that like kind of five to 10% productivity gain. I think that is kind of where folks fall, both at the individual level, speaking of individual developers or organizationally. In terms of how we're measuring that, primarily we found that... you know, self-reported measures of how much time are you saving things to these tools, like is a really good direct signal. We do look at like indirect signals, like cross-sectional analysis of.

PR throughput, like is PR throughput higher? We don't typically see dramatic impact there to no surprise because I think PR throughput is a little bit flawed. It's a flawed premise to begin with. But you do see people bragging about the list they're seeing NPR throughput on social media, things like that. You know, I think a lot of that is correlation, not causation.

yes we see that a lot in our data i think speaking personally you know with at dx and the data we're seeing with customers and our own experience with ai i am i think that code maintainability and quality problem is is a real concern code maintainability is one of the things we measure as part of the developer experience index and we have seen that be lower for teams that are using

GenAI more. And that makes total sense because they're not writing the code, so they're less familiar with how to maintain it. But I think when you look at these tools, they're really good at generating boilerplate. They're really good at kind of scaffolding that zero to one. yeah but most software development isn't zero to one it's like one to 1.1 1.1 to 1.2 it's maintenance iterative improvement and so i think that's where i think the industry isn't

talking about that problem as much yet. You know, certainly the vendors don't want to talk about that problem. And so I think that's the more interesting question. Can AI tools evolve to really be effective at that point? of yeah in a code base this is maturity or are they going to continue to just be zero to one and then cause problems later on going from one to 1.1 so you think that the the biggest reason for this gap is

because of larger code bases that are hard for AI systems to have an impact on because there is more context, you still need human judgment and so on. And the stakes are higher, right? You know, I think most of what we see on social media is, oh, I built a SaaS app in two hours with some tool. Like, that's great, but...

Well, when you get a feature request tomorrow from a real customer saying, Hey, can you change this? Can you change how this works? The real question is how hard will it be for you with or without AI to go make that change? Right. And I'm saying that.

Most software development is more the latter, less the former. And if the latter is slower because you use the AI in the former, then there's a trade-off there. And I think we're starting to see that in the data. And you hear that anecdotally as well. yeah I saw I saw a very revealing stat that said that if you if you try to make an LLM solve like

abstract lead codes, so very little context, they get it like 85% to 90% right. So the score is very, very high. But if you give them like a single ticket on, you know, on a real team, on a real code base. at random, so might be a bug fix or something else. the, the success rate decreases to like less than 20%, which is already high number. I mean, if you think about that in a grand scale of things, but it's, it's dramatic to see the difference. So I agree that it has to be.

something about context that we still need to figure out. Yeah. And there's just so much hype, right? I mean, I am very pro AI. I'm optimistic about it. But there's so much hype right now that it's... misleading to a lot of organizations who have false expectations, especially executives have false expectations right now. And the nuance gets lost because of the hype on social media and in the news.

AI's Potential Beyond Coding

So that's one of the challenges. I agree. And outside of the coding, so when it comes to other parts that you mentioned before that impact. the developer experience. So we talked about information sharing, knowledge bases, focus time meetings. Have you found some areas where AI is also having an impact and maybe it's less discussed than coding?

I think the knowledge management problem is ripe for AI to come and help us out. And I think there's some vendors out there that are attempting to solve that both on the back end and front end, meaning both on the discoverability. So AI search. and discoverability of knowledge, as well as kind of AI-generated documentation and knowledge. I think those are two components of, I think, probably more, a bigger

solution as a whole to how we start to tackle the problem. But yeah, I think AI is well suited for the knowledge problem. I don't see it necessarily tying into the focus time problem as much. they're probably our applications. Yeah. It's still early days. I guess we'll see, but things keeps changing so fast that I expect our opinions to become obsolete by the time this podcast episode comes out. So it's just what it is.

So thank you so much, Abi, for this great chat. Love to have you on the show. So thank you so much. Yeah, thanks for having me, Luca. Thank you so much for listening. If you found this chat valuable you can subscribe to the show on YouTube, Apple Podcasts, Spotify or your favorite podcast app. Also, please consider giving us a rating or leaving a review as that really helps other listeners find the show. You can find all past episodes and learn more about the show at refactoring.fm.

See you in the next episode.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast