¶ Intro / Opening
ML projects tend to fail in this common failure modes. 1 is what I call POC health. Another failure mode is that it's easy to get maybe the first thing out of the door, but then it's kind of stuck together with duct tape and sticky gum and it's caught to test or change. In my experience of working with data intensive ML initiatives or other technology specializations, I'd found that the ability to build the thing wasn't always the major success factor.
Often it was understanding what the right thing to build was, making sure that there was alignment between technical teams and and business teams and product teams on the right thing to build. You can't ML OPS your problems away just like how you can't dev OPS your problems away. ML OPS helps in a few ways, but ML OPS not going to run your tests for you. They're not going to talk to users and make sure you're writing the same right features or implementing the right features.
It's useful to think about the essential characteristics of of ML solutions that we have to do things at speed and scale that no human could. But they'll make mistakes. We wouldn't consider an ML solution if we knew the right answer every time. We'd write some rules instead. One of the quotes opening quotes in our book was from Edward Stemming saying a bad system would be a good person every
time. The whole thesis of Outlook is how do we create those systems to help teams build the right thing, build the thing right, and in a way that's right for people. Hello everyone, welcome back to another new episode of the Technician podcast. Today I have with me two David's very excited. One is David Tan, he is actually my ex colleague in thought books and the other one is David Coles. So they are the co-authors together with another one Ada
who couldn't join us today. They wrote a book titled Effective Machine Learning Teams, even though it's a machine learning as part of the book. But I think we are not going to cover solely just machine learning because I'm also not an ML expert, But we'll discuss things like how we can build effective machine learning teams, what kind of practices that we should be aware of and things like that. So welcome to the show.
I'll maybe mention Dave as David calls right, and David for David Tan. So welcome to the show guys. Yeah. Thanks for having us, Henry. Happy to be here. Thanks Henry, nice to be here.
¶ Career Turning Points
Right. I always love to ask my guests maybe first to share a little bit of themselves by sharing your career turning points that you think we all can learn from that so any one of you can start first. Cool so I'm currently engineering manager in the Air Products group in zero and my career into tech hasn't always been straight. Like I was actually non-technical when I graduated from university.
So I was working in government for maybe 2-3 years and I got by accident handful and mouth disease and I had to like be quarantined for eight days. And from there I started like, you know, watching some YouTube videos about data analytics or coding and then, you know, try some tutorials and, you know, it's really cool. So then decided to, you know, join the programming boot camp, quit my job, did that.
And then giant ThoughtWorks. And yeah, learnt a lot about agile, about test driven development, refactoring, CICD, got into Mr. Engineering and so, you know, finally got into this exciting space of Mr. Engineering and OPS building Mr. products. So yeah, I have to thank this eight day quarantine back in the day for my career. Turning point? Yeah, maybe.
How about you? To to pick on an external theme like that, I'd probably choose deciding to play ultimate Frisbee early in my career where I met a number of people involved in the software scene, but in particular a thought worker who encouraged me to interview at ThoughtWorks. And when I think about turning points, I guess that kind of leads me into the fact that I started as a developer in
technical data intensive roles. I then, you know, looked for ways to make more impact in the leadership space. And when I joined ThoughtWorks, although I'd actually passed a coding interview, I joined as a project manager. Then I've worked for a number of years in agile transformations and organizational transformation before then
picking up the data practice. So I guess my career has turned from deeply technical to leadership positions and organizational design a number of times over that period. Thank you so much for sharing your story. Very interesting, right? So the kind of serendipity that could happen through our career, right? And what led us to where we are at the moment.
¶ Writing "Effective Machine Learning Teams"
So today we're going to discuss topics from your book, Effective Machine Learning Teams. Maybe in the beginning, let's share what made you wrote this book. Yeah, this book started as a blog post actually. So back in 2019 we had wrapped up a project where it involved a lot of refactoring of data science code bases. So we had data science piece for wrote code in the Jupiter notebooks. No tests, a lot of lines.
You know, it's good like it was solving real business problem and then we had to kind of productionize that. And so we started writing about how we did that. How do we encapsulate code into functions? How do we have automated tests? So we are, you know, clear about that, about the quality. How do we do CICD for those
changes? And then we wrote a blog post and it kind of like exploded in back then Twitter, someone from the our community shared it and a lot of likes and it's like, OK, I think we're on to something here. There's this desire, I think in the data sense community or ML community to have more kind of engineering and robust practices. So, yeah, well then we started writing more and more about from the subsequent projects or what we did to kind of large scale ML training pipelines, large scale
high volume ML products. And you know, the practices that helped us build that reliably build ML products that are safe, you get fast feedback on your changes. And some of those things worked really well. So we wanted to share it more broadly with our readers. And that's just the engineering space. And I think Dave also has some other kind of yeah, aspects that
we know. We started with this thread only engineering practices for ML and then we discovered this whole kind of world of other practices that also help ML teams. Yeah. So yeah, in my experience of working with, I guess, data intensive ML initiatives or other technology specializations, I'd found that the ability to build the thing wasn't always the major success
factor. Often it was understanding what the right thing to build was, making sure that there was alignment between technical teams and and business teams and product teams on the right thing to build, which could be complex when people don't speak the same language. So, you know, what sort of processes and techniques can you use to build that alignment, especially when there's a lot of uncertainty both about the problem and the solution.
And then also because of the specialized nature of this work, it's often hard to build a team that can execute end to end. So then the factors around either building in a single team, how do you bring all of those different perspectives together? Or when you have to rely on multiple teams to get a piece of work out the door, what's the best way to achieve fast flow and high quality in a way that's sustainable for the people working in that environment?
So for me, those were the perspectives I wanted to bring to machine learning product development, which is an essentially multidisciplinary activity. You know, how do we do that better as teams? Right, when I write your book, I must admit that I was thinking that I was going to learn a lot of the technical things about specific ML, you know, projects and ML products and things like that. But actually you cover kind of
like a holistic approach. It's not just the engineering, but also like product delivery and things like that. And actually there are so many things that any software engineering team I believe could learn just by reading the book as well. I mean, putting aside some parts of the ML, but let's start maybe in the 1st place because I'm
¶ ML Engineering vs Other Types of Engineering
sure many listeners here not coming from the ML background, what are the differences between, you know, ML engineering and the typical software engineering these days where people build websites, APIs and things like that? Yeah. But that's a great question.
And I think ML systems fundamentally it's orders of magnitude, I think harder and more complex than software engineering in some ways that of course software engineering is harder than we've got micro services, you've got distributed computing, you've got a lot of things like in itself it's an art and a whole discipline. ML itself, the difficulties manifest themselves in some common ways, like, you know, it's not just one service or one component.
Usually it's like a data pipeline dependencies upstream that you know is complex in itself. You've got your skill or compute requirements for your own ML workload, which is also can be done in many ways. You know, you can have a really large instance with really large memory, do everything there, or you know, at some point that breaks down. So how do you architect or make it simple to scale up large workloads? And then there's also the monitoring.
Like when you think about software products, typically your monitoring is kind of your 4 golden signals, right? Success rate, latency, things like that. You can do that for ML and you can be all green and good. But that says nothing about like the quality of the predictions that models are producing and then did the correctness of those. So it adds another layer of challenges that I think the ML OPS community have come to solve
in recent years. And then further, even further down, like, yes, we've built this product is giving accurate predictions to an extent. What is the kind of business impact or what levers on our Dios is it moving? So that's like kind of essentially the fundamental difference between the ML product and software engineering products. There's a small business diagram from Google where like ML is just like a little box and there's like infra, there's monitoring, there's data
quality, data skew. So a lot of moving parts and that's why we wanted to bring a lot of practices that kind of help you obtain the complexity in these different subcomponents. And when you look at how to manage the work as well, then I guess one of the characteristics of building an ML feature is there's a lot less certainty about how it's going to perform upfront.
And so often you have to integrate this exploratory data and analysis or building prototypes or architecture search as it might be described into the product delivery process as well. And then as you converge on a potential solution, then you're also dealing with another vector of change, which is the data and the world changing as well. As David said around monitoring, often that changes slowly, but sometimes we can see step changes.
COVID was a classic example with recommender systems moving from lowest price to highest to best availability and you know, ML models needing to catch up with that. So yeah, you both have less certainty and you have another vector of change to deal with when it comes to managing that work. Yeah. So thanks for sharing some of these complexities, right.
So definitely for people who may not be able to relate, I think typically from what I can see, right, there are so many data pipelines when you build ML, you probably need more than just a small data, right? Something like a medium data of big data and a kind of a compute that you need. Sometimes it's more like a distributed computing with large
scale kind of a pipelines. And I think typically what I can see as well, the feedback loop might be long in an ML project simply because you have this training kind of thing in the
¶ ML and LLM
1st place. Maybe let's clarify. When you say ML, I think we are not just associating it with LLM, right? Like these days people are crazy about LLM and maybe they associate ML now as LLM. So maybe a little bit of insight here, like what do you mean by
ML projects? Yeah, I guess we, yeah, Yeah, great observation, Henry. You know, was thinking, you know, as you talked about feedback cycles and big data and, and all of those elements, yes, those are the things we traditionally associate with supervised machine learning, which I guess is our sort of primary focus in the book, which might be anything from a binary classification problem. Like is this transaction a fraudulent transaction?
And, you know, to solve that with the supervised paradigm, we get a lot of historical data about transactions, their amount, the the age of the account, you know, maybe some information about the network around that account. And then we label those transactions whether they were fraudulent or not. That's actually a nice example of where the labels might be easy to get because we might get them from customer reports of fraud. But in general, that effort of labelling the data set can be
really intensive as well. So that's the sort of main paradigm we're looking at. But we think that, you know, looking through this lens of working with uncertainty, working with data, working with specializations, but there's a whole lot of other paradigms of MLAI that that might fit into that. So obviously working with a large language model in inference mode, that's still a
challenging environment. The supervised paradigm might apply to fine tuning or even training from scratch your own large learning model as well. But then beyond that sort of paradigm, we're also looking at initiatives that might be might use reinforcement learning, for instance, might. It's another approach to recommender systems. And this is where you might not need big data to start with.
You might not have historical data to start with, but you might actually build or tune your data set as you go. We might also look further afield to techniques like simulation or operations research for optimization. They share a lot of the same characteristics. Simulation is actually very similar to machine learning when it comes to productionizing it. It's just that instead of a learned model of the world, you're working with an explicit
model of the world. But you still need to go through feature engineering ability to monitor and assess business impact and align to stakeholder expectations too. So while we're focused on supervised machine learning as a paradigm, yes, there's a big world out there. Anything that's decision making driven off data or models of the world, or requires specialization might fit into this view that we've put forward in the book as well.
¶ Why Many ML Projects Fail
Thanks for the clarification. So maybe let's start going deeper into the book, right? I think one thing that picked my interest when I read the book in the first few chapters of the book, right? I think you cover some statistics here. The first one is actually many ML projects doesn't make it to production, right? So and the second one, even though they reach production, they actually don't solve the real business problem or they
don't bring any value, right? So maybe let's share a little bit why this is the case. What do you see out there as ML practitioners, right? So why is it so hard to actually bringing ML projects to production? Yeah, thanks, Henry. Like that's the crux of the book in that ML projects tend to fail in this, the common failure modes that we describe, and I can go through them and it's kind of preventable types of failure.
Like if we can learn from experience, from history, and then we would bring in the right tactics to make sure that, as you mentioned, let's say a project that, you know, let's say we invest half a year, nine months into it and ship it. And we found that actually it's not solving the right problem or users didn't really care about it. It's like then what techniques can we bring in there with prototype testing or customer research?
So that's kind of the whole kind of the main thesis of the book is like how have we failed and what have we learnt and how can we do better. So 1 common failure mode in ML projects is what I call POC health. It's like a combination of factors. Maybe business don't have enough of the appetite to release
something to users. So we might take the first easy step to build a prototype to test it, maybe internal demo of it, but the resources or risk to productionized it and put it in the hands of users is kind of too great. So what the lack of commitment there means that teams on the ground just kind of build Pocs of the Pocs of the Pocs. So that's detrimental to, you know, the business in some ways that you're missing out on opportunities, so detrimental to
the morale of the team. You do a few and then, you know, after a while you say, what's the point? So, you know, you move on and then you've got churn and you've got onboarding and you're losing talent, right? Another common, I think a failure mode is that it's easy to get maybe the first thing out of the door. You hustle, you get the data, you get the compute, you train
the model, you deploy the thing. But then it's kind of stuck together with duct tape and sticky gum and it's caught to test or change after Evolve based on, you know, new customer feedback, things like that. So then that comes to kind of continuous delivery CICD. How confident can we be to test and deploy a change? And so this set of problems is solved by ML OPS, by continuous delivery for machine learning. There's probably a couple more. I don't want to hop the mic.
Dave, was there any other failure modes that came to mind, all these challenges? Yeah, those are two big ones. It's, you know, is there any value in doing this at all? And that can be really challenging when their responsibility is divided across different teams who might even be in different parts of the organization. There might be AI guess the release of ChatGPT sparked a lot of curiosity in how do we best use LLMS in products. So, but we might have responsibility diffused across
the organization. We might have an the ML team that's interested in looking at the use of large language models and we might have a product team that has their own road map where they don't have these features anywhere on that road map. And then we might rely on data, especially in the case of Gen. AI, that's unstructured data that's not well governed. It's hard to make available in a responsible way to these systems.
And so while each group can do what's within their control to move a little bit towards a future state, it's not aligned or stitched up in a in a cohesive way that allows us to establish whether there is some value quickly with cheap and and
low risk tests. And then, as David said, once you've established this some value, you know, if you have productionize something that's not supported by a lot of these good practices around ML OPS and and continuous delivery, which is often often the case to, to, you know, understand if there is value there, then you start to understand how much value you're
leaving on the table. And then this can be an opportunity to invest in those practices to be able to maximize the lifetime value of an ML product. But then that requires a certain, again, a certain degree of maturity in and a certain ability to work across the existing organizational boundaries or reshape them.
¶ ML Success Modes
Yeah. And that's also kind of the flip side of the failure modes is the success, right? So we've also worked with and seen teams, ML teams successfully deliver, you know, really great demo products and what tactics do they use to succeed like we could see. Things like customer testing and user testing. So before they build out a prototype or MVP, which is a really expensive way to test something, even just talking to users, understanding what is the pinpoint, what are their jobs to
be done, where could ML help? And then once they bought in, they say, OK, this is the right ML product that will solve those problems. Then combating that failure mode of what they've mentioned, like multiple teams kind of throwing across each other. If you remember the DevOps comic where you have devs on one side, you have OPS.
So instead of throwing code between scientists and you know, ML OPS engineers, you know, we do that inverse Conway manoeuvre where we have a cross functional team shipping end to end for this initiative. And when we talk to the team members who worked on that project, they enjoyed that end to end ownership, the satisfaction of putting something releasing to customers. You get to touch different parts of the stack, learn about data science, you learn about OPS, you know, so they help them.
You know, you can get fast feedback. You have the same standard. You ship things iteratively. You showcase it every, you know, 2 springs, 3 springs. So that feeling of flow and speed was like really amazing. And then, you know, when something is released, you know, you've got your continuous delivery, production monitoring when things are not going well. Yeah, and you'll have safety for
changes. If I need to make a refactoring upgrade library, If your CI checks are all green, your model monitoring dashboard is saying performance is good, then you know, you merge that PR and you don't feel nervous or stressful about any releases. So yeah, there there. There are also bright spots in in how teams deliver ML products. Right. So I think when I heard you mentioned, you know in the beginning about POC, hell, you know, so many duct tapes.
So when you bring it to production, I could relate to some other ML projects that I knew of before. So I think many, many ML teams probably in this mode, right? They keep building POC because maybe the expectation from the stakeholders or from the business also kind of like with a lot of hype, right? Because these days people think
AI can do a lot of magic, right? And especially with all the success of GGPT and all that, they even think that it is easy to do. But I think the reality may not be the case, right? And the other thing is about the practices, right? So many ML teams first, either they don't have many good software engineers, so they are like data scientists or you know, ML engineers who just have a lot of expertise in building the models.
But actually they don't have all these other skills like refactoring, continuous delivery that you mentioned, right, testing as well. Or the other flip side, which is a lot of software engineers being turned into ML engineers. So they are not necessarily an ML engineer, but they just come from, you know, like web application development and turn into ML engineer. So I think some of these things
definitely I can relate. And just to add to that, there's also the flip side of engineers or ML engineers being excellent engineers, but not treating it as a data science problem Where there is uncertainties is the need for experimentation to get fast feedback. So yeah, I think the kind of emphasizes the need I think for cost functional collaboration from both like data science from I know OPS from SRE, things like that. So maybe let's go there, right?
¶ Ideal ML Engineering Team Composition
So the composition of the team, you mentioned a few times about cross functional teams and you know, these silos between, for example, data science or maybe the ML OPS or whoever operates the system in the end, maybe there are also other functions like for example, other teams, because ML typically also realized a lot of dependencies like data or maybe it's some kind of model or things like that. So maybe what are the best composition here in your experience? Maybe you can advise us?
I guess here there's another question that is and it's what we tackle towards the end of the book. We look at sort of multiple levels of team effectiveness and one of the levels we look at is an individual in a team, the types of practices you can use, the expertise you can build as an individual. The next level we look at is within teams. Again, how those practices around technology delivering product augment teams but also broader team dynamics. But then the third level we look
at is between teams. So how can teams be effective between teams? And so the so the best make up of a team depends a bit on the shape of the team and its
interactions with other teams. And so this is where we use the team topologies model to identify the different types of ML teams that you might sit in. And so to run through it quickly and we can come back and dive into details, you know, you, you might have a streamline team and this we'd say this is the basic unit to think of first ML project, a streamline team that can deliver end to end. You're not going to be conducting ground breaking ML research.
You're going to be using tried and true techniques in this case. Or at the other end, where you have an established ML ecosystem internally, you can add another streamline team that draws on those existing services internally quite easily. Then as you scale beyond one team, there might be two routes that you take. And so you might go down the route of where you have a sort of low level common concerns across different initiatives.
That might be the opportunity for a technology platform to support ML at at at a lower level like compute and, and, and data and and feature engineering. And so that would be a platform team in the the team topologies team shape and that would aim to provide as a service and that would be composed differently from a stream aligned team. The other route you might take to scale is you might identify
business clusters of ML needs. So there might be something around audience engagement or there might be something around asset valuation or there might be something around content moderation. And so those are sort of specific set of business needs that also come with an associated set of ML paradigms as well and that don't necessarily have a common support in a technology platform at that right business level of abstraction. So then you might have a what's called a complicated subsystem
team. And again, the makeup of that might be a little bit different. And then as you, you know, as you have a bigger ecosystem, then there'll be a range of problems that come up of a similar nature, but like have a unique presentation each time, which might be around, say privacy or ethical use of data or optimizing particular techniques. And this is where you might have an enabling ML team as well that sort of acts as a consultancy to other ML teams to make themselves redundant.
And so that was a long way of saying it depends on the ideal make up of a team, but you might assume it's a stream aligned team and talk about the roles that sit in that. Maybe I'll throw to David. I think Team Topologies is a really useful set of constructs, the ones that they've went through because it helps teams scale through the team's API.
So as Dave mentioned, if we treat everything, let's say as a stream align team, like in software engineering, we're very familiar with cross functional teams. But then the failure mode there is that teams like Conway's Law, right? We've got three teams. We're going to build three sets of, you know, basically some architecture, rebuild certain tools that we need. Then that's where like maybe platform team comes in to
abstract all of that. So then with that, what we've seen worked really well in one particular case was this complicated subsystem team. Like as you mentioned, ML is hard, a lot of moving parts they took on the effort to build this ML product. Let's imagine it's, I can't say this specific example, but let's say it's a car valuations, right? That's kind of ML product data
product. The API to this team or this product is I will give you the planning of cars and now that is self serviceable by other teams, they could embed it on the mobile. Mobile team could integrate with this API and expose that ML capability. A web team can do the same or e-mail marketing team can do the same and then send personalized emails to say you know about car
evaluations. So then that was how that team scaled rather than having to integrate with each particular thing, encapsulating that complicated subsystem as kind of formal set of APIs either through batch or through real time that help the team like achieve more and get more mileage out of the ML product they built. Yes, yes, that's a great example. The complicated subsystem team is probably going to have a
bunch of specialists. It's going to look maybe most like what people imagine an ML team looks like. But as David said, to scale their impact in the organization, they do need that business domain expertise. They do need to deliver as a service instead of constantly collaborating with other teams. You know, maybe some product thinking helps with that service definition as well. So, you know, that might be one team composition.
Yeah, that's right. And if you replace that example there from correlations, which is a bit like I think not everybody can be with that, let's say it's like travel recommendations or product recommendations, then that one team that spent all the effort in building product recommendations now can impact multiple parts of the business by having the right team topology to have that fracture
playing around. OK, my team is doing product recommendations and yeah, applying the kind of data product disciplines, exposing this as an API, encapsulating the details. Very exciting to hear about team topologies mentioned for ML product ML teams, right. So I think it's kind of like back then, right? It was a revolutionary approach to how we kind of like create
different teams. And I, I'm glad that you brought it up because still I believe in many companies, they think, OK, we want to build ML product and ML project. They just hired a few data scientists or, you know, these ML experts and they just asked them to kind of like build the model first without actually involving the other aspects of software engineers. Could be the streamline team up from the product side, or it could be, you know, the data or
it could be anything, right? But I think that tends to kind of like have its challenges. So I think bringing the concepts such as Steam Topologies to actually think holistically how we're going to the different ML projects is something right, really, really important. Yeah, yeah. So that's a really. Great point and I think one of the quotes opening quotes in our book was from Edwards Emming saying a bad system would be a good person every time.
So you can hire the smartest data scientists, put them in an environment where it's not the right system, then, you know, we get what you described there. So the whole thesis of Outlook is how do we create those systems to help teams build the right thing, you know, solve the right problem, build the thing right, You know, engineering and my engineering data science.
And then in a way that's right for people, I think what they've puts in a really good way, like it's not just shipping and shipping, but in a way that has the right team shape, right collaboration, more right trust, psychological safety and also the right processes to deliver ship early and often. So, yeah, I really was intrigued when you mentioned, you know, you can't just put a group of data scientists together or
engineers together. You really got to create that system where they can, you know, ship to the right and solve the right problems. Yeah, thanks for adding that To
¶ Building the Right ML Product
come back to the theme of, you know this product discipline, right. So I think we know that a lot of ML projects doesn't make into production, right or they solve the wrong problem or they don't bring value, right. So I think they're, like they've mentioned in the beginning, there are a lot of uncertainties when you actually build ML model ML product, right? Because I mean, the way it works also, it's kind of like prediction. It's kind of like there's some kind of ambiguities inside LLM,
there's a hallucination, right? So how can you actually come up with an approach such that, you know, when you first building the ML product, you can actually build something that is kind of like bringing business value either to the users or to the organization. So I think this is probably 1 hard aspect as well. So typically how would you run this? It is hard and it's it goes beyond the technical, although the technical informs it.
I, I find it's useful to think about the essential characteristics of of ML solutions. We're exploring them because we think they're going to be superhuman in some aspects. They're probably not going to beat the best experts in a field. That's maybe one myth that we should tackle straight up, but you know, they'll be able to do things at speed and scale that no human could. But then we need to also consider that they'll make mistakes by the very nature.
Again, we wouldn't consider an ML solution if we knew the right answer every time. We'd write some rules instead. So there'll be some percentage of mistakes, however small, in, in any ML solution by design, as well as the unforeseen mistakes.
And then when it comes to those mistakes by design, we really need to understand the the cost sensitivity of what's, you know, what value do we get out of a bunch of right answers and what is the impact of maybe a very small number of wrong answers, but they could have a very huge impact across all sorts of dimensions to financial as well as security bias and fair treatment of all our stakeholders. So yeah, we'd need to consider that fallible nature of them as
well. And so, you know, we need to start with products that are designed to handle that failure. You know, they have some upside from when ML gets it right, but they're robust to the times when ML gets it wrong because it will. So starting from that perspective, you know, we can then identify some experiments about how well does this need to work. We can start with very simple baselines.
Sometimes, you know, even just predicting the majority class or or random guessing and you know, seeing how that works as as a product, but you know, ideally yet to resolve this uncertainty, we're getting into some real data and understanding the predictive potential of the real data. And so this is again where it's like it's a challenging multidisciplinary exercise where we're trying to proceed on
multiple fronts. So we building the right thing can out, yet can our solution support or a proposed solution support the performance that we expect and so on. Yeah. Well, just to add to that, I think that emphasizes the importance of that cross functional, the nature of the work as well.
Like if we frame this as a data science or ML problem, then we can try our level best to go from, let's say 55% accuracy to 99. You will never get to 100 and we will spend many, many weeks and months trying to get there. So it's not just the ML problem where we try to improve the model's accuracy or recall precision, but also how can we design for these failure modes of the ML model?
So like for example, you know, hacks, displaying, designing a product in a way, right, to show users that, OK, this is not a competent prediction or this is a prediction, but would you correct that? And also maybe even giving users options like these are the top three, like the models, top one prediction may be, you know, not where we want it to be, but the top three, OK, is much higher. So designing the product in a way that mitigates these failure modes of the ML model.
And yeah, I it's easy for teams if they don't have the right capabilities or right skill sets to try to solve it. Like if you are a hammer, everything is a nail and it's very costly to try to level up the accuracy. We may never get there. But yeah, having that cross functional approach, like how do we design it in a different way? How do we talk to users? Like the users find this OK, that's a kind of more holistic way to solve the problem.
Yeah, that, that hammer and nail is, is really important as you start to shift to OK, yeah, how do we make this viable or how do we, yeah, make it viable from the perspective of solving the problem effectively, but also from being economically sustainable to to maintain it. And I guess there's another essential characteristic around it, the fact that ML solutions are narrow. The very training process is to optimize a loss function, which is a narrow definition of
success, but they're composable. And you know, this is, I think where Gen. AI can be really interesting. And I'm not the only one to take this perspective, but I've described like Gen. AI is a stone soup for innovation. So the story of Stone Soup if if you haven't heard it, is that a weary traveller arrives at a village late at night and all they have in their knapsack is a stone. So they go to the first house in the village and they ask the villager there, could I have an
onion to make stone soup? You know, I've got the stone, all I need from you is an onion. That villager says, oh great, yeah, I'll provide an onion. They go to the next house and repeat and ask for carrots and, you know, proceed around the village. By the end of that process, they cook up an amazing soup that feeds the whole village and it
was all cooked from a stone. Considering that, you know, you might have a spark of an idea, you might be able to prototype it easily, but it might actually be made-up of many different components composed together to produce something that looks like that behaves intelligently. It needs to be factored into
that process as well. And you know, one of those big components will be the differentiated data that you bring and often like the step from prototyping something in a experimental environment to actually plumbing those data pipelines in a way that's sustainable for production use. You know, that can be a major step as well. That needs to be considered upfront.
And so, you know, when we've talked about, when we've explored AI and ML initiatives, you know, we put a heavy weighting on that factor of where will the data come from and yeah, how will you deliver it to the product or application. You know, that's again, you know, it's one of those laws where it always takes longer than you expect, even when you plan for it to take longer than you expected. Like any software engineering
project out there, right? So I think the most important thing is ML product ML project, right? So you need to still have the product thinking concept in the very beginning, right? So you, you've mentioned a little bit about, you know, is user interviews experiment, you know, building prototypes, making sure the data that you feed into the OR the ML training and all that is also appropriate, right? And I, I like the mentioning
about failure modes, right? Because unlike other products out there, we kind of like know the input and output that we want the features to be, right? So ML product typically could fail in AI don't know unpredictable way, right? So we have so many things mentioned in the news, like for example, Google, you know, image classification, you know, the ChatGPT and you know, Gemini or bot giving wrong hallucinating answers. So like, how do you tackle that?
Plus, I think the whole aspect of, you know, data security bias, right? And also others associated aspects of Fair data that you use in the training, right? I think it's also another thing that you should put your product thinking concept holistically so that you can come up with a very useful and valuable product. So let's go to the other
¶ ML Engineering Best Practices
discipline, which is the engineering side, right? So I think what I could see in the past as well, like a lot of ML code is kind of like highly unstructured Jose, right? So it's like procedural. There's no proper modeling. It's just function calls over function calls, very complex to trace. So maybe in your view, and you mentioned in the beginning as well, you took a project to
refactor ML project, right? So what are the disciplines that typically are lacking in the engineering aspect of ML product and how we could do better? Yeah, I personally can resonate and relate with that experience. I have to get glasses recently, maybe because of oh, but I think mainly because looking at too much code all the time and sometimes in the late nights because of strengths of you know, which is.
Other things which we as a healthy team, you know, if we do those right practices with automated testing, with continuous delivery, automated deployment, then nobody should need to work late night. So I think a couple of things that you mentioned there. Number one, I think test automation is a big part. Like every ML engineer, data scientist we've worked with, they enjoy the automated tests
that we introduced and added. And so it's, I think here at this point is a problem of information asymmetry. Like we've got pockets of teams of people who know how to do automated testing for ML systems. And every team that I've joined and worked with, they say, oh, I didn't know this, you could do that. So I think that's the desire and demand for, you know, more automated testing ML systems. 1 encapsulating story was we had a project, it has kind of close to 0 test coverage.
It was an LLM system. So over time we added the test pyramid like the simple things like unit tests, integration tests to touch our whole LLM application, check that it's OK, those are still point based tests, right?
And we also have that kind of more like deeper like model eval tests to suite off however many examples we run it 5 minutes later, we know OK model accuracy is 75% or whatever number that is. So then we had this automated dependency manager like on upgrade called SNC or renovate or dependent board. So SNC open up APR saying you need to upgrade this it automatically. The PR have all these screen takes tests for passing. We automatically trigger model eval.
We know, OK, performance is just as good. So in 15 minutes we could merge the PR, no stress, no effort. So that's how we grow capacity of a team just by having this automated testing model eval. You touched a little bit on software design or code design as well. And so yeah, that's, I think any code base, not just ML is
susceptible to that. And so I think taking that next level of discipline to say, can I attract function for this, Can I have a readable variable name, not just DF or X, all of those software hedging practices which we can link some in the show notes, single responsibility principle, my favorite one is open close principle like can you design something that is open to extension, but you don't have to modify it every time?
Probably I'm going to too much detail there, but I think the design of it or lack of design sometimes stems from the lack of tests. As you know, if there's no test, then nobody can refactor. Refactoring so scary, so risky, like nobody does that. We pick the path of the least resistance. So yeah, I think the teams that we've worked with, the moment we added that safety harness, when you have that test on the path to production, then a lot of
things can happen. One time we did massive refactoring of like this variable that was LinkedIn all different places. You know, it was, you use ID shortcut, replaced it in like, I don't know, 50 places. And then, you know, test pass commit done, right. It was like either effort reduced from maybe days of testing to again minutes and then in 20 minutes it was running in production. So yeah, it's like a lot of these practices that have been emerging.
I think it's just about spreading it more and it's why we wrote the book so that our teams don't do late nights writing code like I did in one project. And yeah, just enjoy the flow and work life balance and yeah, zero stress and production deployments because, you know, tester passing, you've got production monitoring. So a lot of these engineering practices will, you know, really help team feel the joy and flow of with ML products.
Yeah, I think the call out around testing is really crucial and actually allocating your effort effectively. So in a regular software product we might use the test pyramid to direct effort so that we have you know large number of cheap high level tests. You know, we have a medium number of integration level tests and then, you know, we won't have a small number of end to end tests. We can actually in when we're looking at ML applications and other data intensive applications, we can add a
second dimension to that. So it's agreed instead of a instead of a period. And that dimension is the data dimension. So you know, we might have, you know, a lot of small cheap tests around individual data points. So Canaries, I guess you might, you might also describe those as we might also then have samples tests at a sample level. So these are going to give us a little bit more insight than a point data test. They're going to have some
variability. So there's going to be some tuning there, but they offer us much faster feedback than the final level, which might be a kind of global, let's test on all the data that we have. And often people don't think hard enough about balancing the tests across that spectrum of data. So, you know, you might be able to do a very quick training run on a very small subset of data.
If anything's misconfigured in the training run and the training doesn't work properly, for instance, you know, that will break and you'll get that feedback really quickly rather than waiting hours for it. But you might also, you know, the training might pass, you might have a, a lower benchmark or threshold for acceptable performance on that low run. But at least you've tested end to end and got some feedback with a, with a sample data set that it's, it's likely to work
at the large scale. So it's all about, you know, bringing that feedback back. And I think when when we come to that uncertainty in the front end as well, also being thoughtful about how you test under those conditions of uncertainty, when you don't even know what it is you're looking for in exploratory data analysis. And so there it's kind of moving from the unknown unknowns to known unknowns to known knowns through testing.
You know, visualisation is really key to be able to look at the data and understand what it's telling you or use automated tools to find relationships in the data. And then when you sort of understand what the data's telling you so that visualization doesn't look right, does it look like? I expect that can actually be a form of testing called a visualization driven development
at times as well. But then once you understand qualitatively what you're looking for, then you know, there's a whole range of data science techniques that you can use to turn that into a binary expectation that can pass or fail. That, you know, might be useful in an exploratory environment, but then might also be something that you promote into a production integration pipeline
as well as David was describing. So really, you know, thinking hard about how you use testing to get fast feedback all the way through the life cycle is pretty crucial. And just to add to that, LLMS, as you mentioned is all the rage for the past two years. And so one of the teams we worked with, we had to innovate and think about how to shift that. So as Dave mentioned, full evaluation can be costly, it takes time, it can cost money, especially for LRMS. So what we ended up doing was to
shift again that left. So before we kick start with big eval or any further deployments, we had a integration test and one of the challenges that the team said was like how can you test something that's non deterministic? You know, the answer is different every time. So we had to, you know, we wrote a assertion function that asserts on intent rather than vocabulary. So we can still evaluate that this response given these conditions. Yes, it's the intent of what we
we expect what we had expected. We were using pie ham Crest for that kind of amateur style extensions or you could use, you know, other tools as well. But yeah, I think sometimes at the forefront of this new capability, we have to be a bit creative on how can we shoot that left to get the feedback that they've mentioned. Right, thanks for mentioning some of these techniques. I'm always intrigued like you
mentioned, right? So I'm always intrigued, how can you test something that is non deterministic and you know, so many variables that could come in into play, right? So I think thanks for bringing also the importance of automatic testing. I think I like what you mentioned when you explained that, right? There's a little bit of information asymmetry.
Maybe there are some MLA engineers who are never exposed to some of these techniques and when they know it, actually they could actually follow the discipline and make sure that the products are getting better and better. And I like also the approach of, you know, slicing the data for
different stages of tests. I think that's also key in making sure that the ML projects also kind of like still behaves as what we expect because like, for example, you can tweak the model a little bit, you know, the output can change so much, right? So we don't want that happen in the production. The other aspect of ML that
¶ MLOps
people always talk about lately is about ML OPS, you know, building platforms, you know how to actually deploy a model and operate it. So maybe a little bit here. What do you think about ML OPS? Is it something that we all need for building ML product and what problem does it solve? Yeah, I think it's yet another tool in our toolkit, which I very much welcome. You know, back in the day we have to wrangle and think how to solve large scale distributed
processing. But now there are these ML OPS tools that let you abstract away that concern. So in one case, one team we have built at ML platform where now the data center is anyone who doesn't have know anything about infrastructure or AWS or Kubernetes, they won't just write plain Python And say, I want to have this, you know, large vertical scaling this way, I want to fan out and all of that starting from Pythons Compute is one part of the ML op
stack. You know, there's experiment tracking, which has really helped us as well. Every pull request runs an experiment that reports some results that we can kind of check over time if David creates a new PR that this is better than our champion model. So that ML OPS practice was really, really welcome as well. The challenge here is like too many tools and it's hard to navigate. Powerworks had this article called A Guide to Evaluating ML Platforms, which we could link
in the show notes. That really helped. Like thinking about the capability, like some platforms try to do everything, some are narrow. So how do you pick what's right? And how do you avoid like shotgun surgery, like vendor coupling, things like that. But yeah, end of the day, I think ML OPS is about abstracting away complexity so that you can focus on solving the right problem, not having to deal with undifferentiated labor
in your day-to-day work. And I think, yeah, coming back to the testing perspective as well, I think one of the things we highlight in the book to get the most out of ML OPS automation and abstraction, you also need to ensure that you're doing the right testing to give you confidence that when you're moving fast, you're doing so safely.
Yep, and to add to that as well, like that's a key point of making the book in that ML OPS you can't ML OPS your problems away just like how you can't dev OPS your problems away. Last week you had on the show on last episode DX with Laura Tyco and talking about Dev X. So I really like this diagram of the DFX triangle. How do you get faster feedback loops? How do you manage cognitive load? How do you get in the flow
state? ML OPS helps in a few ways, but ML is not going to write your tests for you. They're not going to make talk to users and make sure you're writing the same right features or implementing the right features. They're not going to make sure your code is nicely factored and readable so that you can stay in
the flow. So, yeah, I think it's another tool button is to be coupled with these other disciplines that Dave and I mentioned in this podcast and in the bulk, Yeah. Yeah, thanks for the plug for the developer experience as well, right. So don't forget any kind of ML product essentially it's also like a software engineering problem, right? So it's a social technical. So don't forget also the aspect of this feedback loops, ecological safety you also mentioned at the beginning, right.
So all this, like I mentioned at the very beginning, right, it's not just ML technicals that you need to understand, but it's actually at the end, it's a software engineering thing that you have to handle really, really well.
¶ Make Good Easy
So we have talked a lot about the other things as we move towards the end. Is there anything that we haven't covered that you think should be mentioned as well? I think one key take away is how do we make good easy as engineering leaders, as ML practitioners, We've talked a lot about a lot of different practices.
If we can make good easy, then teams can kind of just by following the team practices, following exemplar repos, then you get that for free in your CICD setup, your test strategy, even maybe hygiene checks of talking to users. Have you put a business case together before you start asking people to work on this for six months? So yeah, it can get out of hand easily with so many moving
parts. So I think as an engineering leader, how do we make good, easy, make teams on the ground when they get the mission to do a certain piece of work? Like it's kind of built into the way of working. Yeah. So I like it make good easy, right. So sometimes we all get excited about the technology, so many moving parts, so many technologies that we can play with, right. But we forgot to expect to make it easy for people to adopt, make it easy to get the buy in as well.
So I think thanks for mentioning that.
¶ 3 Tech Lead Wisdom
So it's been an exciting conversation. I learned a lot about what it takes to actually build an ML projects, which I find it really complicated. But as we reach the end of our conversation, I have one last question that I'd like to ask you. I call this the three technical leadership question. Just treat a bit like an advice that you want to give to us as a listeners. So what will be the three technical leadership wisdom that you can share with us?
Shall I go first? I think again, in line with some of the philosophy of the book, being able to take different perspectives on technical problems is really key for your leadership growth. And so looking for opportunities to play different roles in projects, even for a short time, it gives you that understanding of what other stakeholders require and how to make them successful as you aim to be successful yourself. Yep, I thought of 2. So 1 is I call it focus, function and fire.
So this was an idea I got from Todd Henry in his book I think Taming Tigers. So we are building things every day. Teams can get distracted with many things. So how can we to set our team up
for success? How can we give them that focus, that clear mission, the milestone, the why we're doing it for the customers, the benefit for the business, the business case, Like how will this benefit, you know, the metrics that we care about function, you know, the way of working instead of throwing work or communications over two teams. If have we got the right way of working and fire, you know, meaning the kind of implicit motivation like why are we doing this?
How is this helping people? How is this helping business? So that that for me is one principle I take to my teams. The second one was really interesting. We covered this in the book as well about trust and psychological safety. So when it's absent from the room, then a simple conversation becomes a process of bureaucracy. OK, I got to write up this documentation. We've got to pre read it and let's have a meeting in two weeks to discuss.
And when it's also not there, then team members are afraid to voice concerns of how things might fail. So then we continue to track down the wrong direction. So, yeah, as kind of tech leaders, how do we make sure we have, you know, embody and encourage and ensure we have the psychological safety in the team so that we can all do our best? Work on David's final point. This is what I would say that idea of trust is, is also really important in a multidisciplinary
team and innovative initiatives. Under certain under conditions of uncertainty, we want anyone to be able to speak out with good ideas or concerns that things might be broken. And so to be able to create those serendipitous moments as well as the well understood moments that trust facilitates is a pretty key focus for leaders. Yeah, I like the last aspect that you mentioned about
psychological safety. So you also mentioned that if it's not there, right, things can tend to become like a bureaucratic kind of thing, right? So I think that's really a good plot. So if people want to, you know, learn more about these exciting things that you mentioned in the book, or if they want to discuss with you, maybe is there a place where they can find any of you
or both of you online? Yeah, you can find me on LinkedIn, David Coles. Yep, and I'm David Feidt and I can share a link in the show notes or we can share a link in the show notes. And our book Effective Machine Learning Teams is also the 1st chapter. And the preface, actually, sorry, the preface is available for free on the link that we can share in the show notes. So that gives you an overview of everything we've talked about in this podcast in seven pages. And also the book itself.
We can look in the show note on where you can yeah, read or listen to it. Thank you so much. So it's been a pleasure to have both of you David's in the show. So I hope people learn a lot about the aspects of machine learning model and software engineering good practices anyway at the end. Yeah, Thanks so much, Henry. This is a great chat. Thanks for having us, Henry. That was great.
