#225 - Driving Engineering Excellence with Platform Engineering and IDP - Ganesh Datta | Tech Lead Journal podcast

⁠¶ Trailer & Intro

00:00

I imagine if a sales leader said like I don't need Salesforce, I don't need CRM, we're just going to sell and hope for the best. Like you would fire that sales leader tomorrow. So why is it that injuring leaders are not treated the same way? Every single other function has a system of record, right? Like sales teams have CRM, finance has ERP, but what does injuring have?

00:18

It's like the wild, Wild West. Today I have with me Ganesh Dutta. He's the Co founder and CTO of Cortex dot IO. Reliability practices, security practices, efficiency practices, and velocity practices. Those four are the kind of like pillars of injuring excellence to us. Injuring excellence basically means like the alignment of injuring practices with business outcomes. Think about those business outcomes that injuring excellence ties into. It tends to buck into these three things.

00:44

It's time to value, time to market, and innovation. It's not just about the metric. What you need is like an end to end platform that helps you understand like your software ecosystems. At what point in engineering stage do you think everyone should try thinking seriously about having this platform engineering and internal developer platform? You need to start with data. Do you have a system where you can track your software assets,

01:04

infrastructure assets? A single place where you can determine accountability and ownership. What do you see some this misconception people have when they think they want to adopt A platform? One of the biggest misconceptions with portals in particular is do I have to get all my data in order first before I put it into the portal? The second misconception is. Hey guys, welcome back to another new episode of the

01:40

Technical Journal podcast. Today I have with me Ganesh Dutta. He's the Co founder and CTO of Cortex dot IO. So today we'll be talking a lot about how to build a great engineering team best in class. So welcome to the show Ganesh. Thanks. For having me. Right Ganesh, I always love to

⁠¶ Career Turning Points

01:58

start by maybe having my guests explaining about themselves. Maybe. Any current turning points that you think we all can learn from you? So my name is Ganesh, I'm one of the Co founders and CTO Cortex. Like you mentioned, I was a software engineer before starting Cortex. I was at a fintech startup and I got to see the mullet to micro service journey while I was there. And it was a very interesting opportunity because there are quite a few turning points.

02:24

And I'll kind of give you the story of the journey that I went through and some of those turning points. So when I started at this company, I was working on a team and there was an initiative to break the first microservice out of the monolith. I eventually got added to that team and it was interesting because at the time that particular project was successful, but kind of struggling. And so like, it was a first

02:49

microservice. There's a lot of learning, a lot of things we had to do. And it was an interesting time because at face value was like, OK, a lot of things are broken, like lots of bugs, like a lot of bug fixing. And it was, especially as an early career engineer, it felt like, Oh my God, I'm just like fixing bugs all day long. Like, what's the point here? But I would say it was actually a turning point in my career because it allowed me.

03:13

And this is where I think my manager coaching helped a lot was, hey, like, don't look at these as just bugs, right? Like if you want to become a, you know, very senior engineer one day, the whole point is to be able to look at these things and say, what are the patterns we're seeing here, right? And so like, I think that framing really, really helped. I was like, OK, well, I started thinking about them as bugs and let me start thinking about them as patterns of issues in this

03:36

particular micro service. And so eventually I, you know, saw a ton of patterns and I got the opportunity to propose like a new architecture for the way we were computing, like account management data for credit cards and loans and things like that. And so we got by in to work on this big project. And it was great because having had that personal pain of going through all these different bugs, I knew exactly the impact it would have to do this RE architecture.

04:01

And like I know every engineer wants to do a RE architecture, but I think the the point was I was able to articulate a very clear business value and like the types of bugs we were running into and the customer impact and whatnot. And so I've got to take on that project and there was a ton of learnings around that, you know, the way you communicate it, the way you set timeline. So that I would say like that was a really pivotal project for me.

04:21

Another thing that, you know, I got to do while I was there was a lot of performance based things. It was like these small issues that kept popping up. And it was an opportunity for me to be much more proactive. Like, without anybody saying, like, you just go in and dig through Datadog and say, like, where are we seeing like our highest P90 fives and, you know, P90 requests and what are the patterns, What are the bottlenecks we're seeing? And can we start attacking those more aggressively?

04:45

And so in my spare time, I would go and try to fix these issues and eventually it was a really fun thing we would do at every endpoint that we fixed our P 90s and P90 fives, we would print out a chart of like the latency graph and I would stick it up on the wall next to me. So eventually I had like a trophy case of these like latency charts. It's like, look at all the things that we fixed, you know, over the over the last year or so.

05:06

And it was just a way to show like, hey, I'm, it's a way for you to represent the kind of like sloping work that you have to do to keep services and systems up and running. So I would say that's another turning point. And the last one I'll say was really investing time and energy into how can I take all the things that we learned and share

05:23

them across the organization. And it was another thing that was just like it came from personal pain, which is I as an engineer that was on call for service, I would get paid for something. I'd go to another service and I try to like figure out what's going on. It was it was like a downstream or upstream service on an issue. And like the logs look totally different. The metrics are named completely different. Like, why is my service doing one thing and your service is

05:44

doing another thing? Like it sucks to manage these services. I took it upon myself to say, OK, we're going to standardize this thing. And it was more out of frustration than anything else. I was like, we just got to do things the same way. Otherwise I'm going to like, I'm going to lose it. And so put together this like what I eventually realized was a production readiness checklist, but it was just like list of best practices and standards and like login formats.

06:05

And I was like, OK, nobody's following these practices. So I'm going to make it really easy to do these things. I built like standard libraries that like would pre configure your request tracing and your logging. And then we built like cookie cutter templates to spin up these services and without realizing it was like I was kind of doing these like grunt work, dev experience type things. And now obviously like teams have come up around that, but

06:25

the time was not really a thing. And it was an example of like, hey, if you can make your peers lives better, you will get that. You will earn that respect. You will earn the right to kind of take on bigger and bigger projects. And like, don't wait for permission for a lot of these things. Like you can make things better. And so that was another turning point for me. And like, the reason I tell these stories is also it helped me in my own personal career and growth as a software engineer.

06:47

But those were the things that led me to Cortex as well. So like the idea of like, hey, how do we help bigger and bigger organizations do like production readiness and ownership and standardizing the way you you operate And like Dr. Excellence and standards like that became a bigger and bigger, bigger question for me until it was like, I got to do this a better way. And that's going to have Cortex started. So it was a kind of a series of, you know, turning points for me and my career.

07:07

And then eventually the biggest turning point was like I quit and started Cortex. So like, yeah, those those are kind of the things that I found really impactful as part of my journey. Thank you for sharing such a good story, right? So I feel there are many things that we can learn just from your sharing, right? The first is about like looking for patterns, right? Maybe it could be bugs, could be issues, incidents, whatever that

07:26

is, right? Because sometimes when we are tested in those moments, right, we feel frustrated or why is this happening to me? But actually, if you want to level up and step up, right, So you can probably solve the root cause instead of just the bugs, the symptoms of the issues, right? And I think learning, learning and sharing, I think that's really a good thing as well for any engineers, right? It could be internal, it could be also publicly, maybe through blog post or whatever that is,

07:49

right? So maybe I would like to pick a

⁠¶ The Practice of Finding the Patterns in Issues

07:51

little bit about finding the patterns, right? Because I'm sure in most engineering teams, they will always have these issues, incidents that they have to face, especially when things are in production, right? So for engineers that want to, I don't know, improve their skills in finding the root cause and fixing it, what will be your advice to, you know, look for patterns instead of just fixing the issues? Is there any practices or things

08:15

that you do normally? Yeah, I think it's, it's important to be very systematic about it. If we're a marketing team and we're trying to find like which cold emails work best, we're not just going to like look at our emails and pray like, OK, well, let's hope we get the best e-mail. It's like, OK, we're going to send out three different types of emails from look at the response rates. And like you're very systematic about it. Like, why not do the exact same

08:38

thing for engineering practices? And so if you were like a very junior engineer and you're feeling frustrated by incidents, let's take an example. One of the things that you can do is think about, OK, am I frustrated at the fact that I'm being alerted? Am I frustrated at like constant interruptions? Am I being frustrated? Am I frustrated about the lack of of information when something goes wrong? Like something is it something is going wrong and I don't know what to do about it.

09:04

Something is going wrong and it's the same thing has happened multiple times. Nothing is going wrong and I'm being alerted about it. Nothing is going wrong or it looks like nothing is going wrong, but something is going wrong. When you realize like there's kind of these buckets you can bucket it into. And so I was like, OK, I'll start there, but how would I like, assess my own view of the world in some sort of like

09:22

meaningful way? And it doesn't have to be perfect because if you start with some simple like categorization of these things, like as you go through, you will naturally realize, hey, there's actually another class of things that I didn't think about. Let me go back and like it right on my pattern, the patterns I want to match against and so on. And so like starting with like even something rudimentary is

09:38

really important. The thing that, you know, I started with as a very junior engineer was like alert categorization. It was like this bucketing of alerts. And I was like, OK, like I don't even know what the patterns are, but I know that there's like general things that like I'm getting alerted for that I

09:51

shouldn't be or what not. As I went through and I got a list of like all the alerts in the last three months and I literally just went through and marked them like, there shouldn't be alert, there should be alert, there shouldn't be alert, here's a run book, blah, blah. And then share it with a team. I was like, I don't even know if I'm doing this right. Let's get the team's feedback.

10:05

And it was like, hey, actually this alert is really important because of X or this alert doesn't make sense. We can like quiet it or, you know, it's kind of like that, that conversation of why are we even doing these things? And then you go back and you kind of iterate and it's like, OK, well, if this alert's important, then why is it frustrating? Is it because I don't know what to do about it? Then you go back with the proposal and like, I think that iteration loop is really

10:24

important. I think the other thing to do is like, honestly, I mean, now with LLMS, you can even like take a list of your JIRA tickets and put it into an LLM and say like, hey, like, tell me what you, what do you see here? All right, like it's much easier now than I think it was like 10 years ago. But the ability to kind of go through and say, if you're on a product team, can you categorize the types of things you're working on into the highest

10:47

level abstraction? It's it's like basic software engine practice. You're like, what is the highest level abstraction possible? Before you go really low, Like, OK, these tickets that I'm working on, these bugs I'm working on are on this part of the product and this part of the product. OK, let me go into that part of the product. Now make it a little bit more granular, a little more granular.

11:02

It's like, OK, well now I'm not able to break it down into any patterns, but you naturally found patterns that way by like starting at the biggest abstraction layer and whittling it down more and more granular. Then you can work your way up and see how things relate. And so I think going through the like that systematic exercise is really important.

11:16

Yeah, so many great tips for those of you engineers who want to kind of like improve your skills in terms of maybe we call IT system thinking also sometimes, right, how to actually identify this kind of root cause patterns. And I like the tips where you mentioned using LLM, right. So maybe try give it a try. You know, LLM is sometimes very good at summarizing things and seeing things that we don't see normally. So I think that's a good tips as

⁠¶ The Definition of Engineering Excellence

11:39

well. So the next today we plan to talk about building a best in class engineering team or building engineering excellence within that team, right? So maybe could you start a little bit by defining first what is engineering excellence to you now that you are ACTO at a company and a previous job as well you have seen like big engineering teams operate. How do you define engineering excellence? It's a great question.

12:01

The one thing I'll add is like, I feel like I have a unique perspective here because you have been a sophomore engineering being ACTO now, but then also the kind of goal of cortexes of companies to help our customers with their injuring excellence initiatives. I've seen like how the world's best like e-commerce companies and financial services companies, healthech companies have tried to do this as well.

12:20

Until like as I, you know, answer your questions, I'm going to try and weave in like the things that I've learned from our own customers and are like the other CTOS that we work with. I think to us injuring excellence basically means like the alignment of injuring practices with business outcomes, right? So like of course, every engine organization needs to build features and products and all that stuff.

12:39

Like that's table stakes, but it's the practices that we adopt as an engine organization that lead to the outcomes that we care about. And so again, kind of going back to the, the sales and marketing analogy, like sales excellence is like given an account, given an opportunity, how likely am I to close it at what dollar amount, right? Like that's like the, the obvious business outcome. There's practices I can do to do that, right? It's like, am I asking the right questions?

13:04

Am I talking to the right personas? Am I, you know, using the right decks? Am I, you know, helping them see the value of the product properly, like all the right practices. It makes it more likely for a customer to see value and buy the product, right? And like sales organizations have been doing this for like, you know, a century, at least at this point of like, you know, it's a very well studied thing like bookkeeping and like tracking your like CRMS.

13:23

The idea of CRMS existed way before Salesforce, right? Like this idea of like, if we can build a high functioning machine and we like really operationalize the way we do sales, we're going to drive more revenue. Engineering is like slowly coming to that same realization of like, hey, if we build the right systems and the right processes and the way we do engineering, we will get those

13:43

business outcomes we care about. And so when I think about this business outcomes that engineering excellence ties into, it tends to buck into these three things. And this is what we think about as well. It's time to time to market and innovation. So are we, are you innovating as much as you can? Are you getting your product into the market as quickly as possible? The second one is around cost efficiency. So like are you managing your cost and you're being as efficient as you can?

14:05

Like are you doing more with less? And the third one is around product quality and customer reliability. So like are we delivering a world class product is reliable, you know, is the customer happy with it? I use these three examples because depending on where you are in your company journey, what type of company you are, the outcome you care about might be totally different, right? Like as a start up at the very beginning of the cortex journey, all we care about was time to

14:26

market. Like it was like, we don't care if there's quality, like it's going to break, but we want to see what works and we want to see what people care about. So like, let's move really quickly. And then from there it became like, OK, well now we have a ton of customers. They really care about reliability. We want to build a great customer experience. Time to market is still really important and we don't want to lose that.

14:43

But it's an ebb to flow. Like OK, these next couple months we're going to really focus on reliability. That's the thing that really matters. But you talk like a larger organization, maybe they care about cost efficiency. Like, hey, we hired 1000 engineers in the last like 3 years and like, you know, there's a lot of pressure from investors and blah, blah, blah. And we got to like, you know, focus on cost. We're going to really double down on that.

15:02

You know, maybe a company's trying to go IPO and they need to get their cost in order. There's some outcome that your processes can relate to. And so injuring excellence is like the connection of injuring process to those outcomes. We bucket those. In my head, I think about injuring excellence. I break into kind of these 4 buckets underneath that. So it's like it's reliability practices, security practices, efficiency practices and

15:25

velocity practices. And so those four are the kind of like pillars of injury excellence, if you will. And so it's like it spans the entire SDLC, right? When you think about like developer experience, for example, it's like it's about the tools for a developer in their like day-to-day work, right? It's like very localized interior excellence is more broad. It's like architecture review process to incident management to security. Like all those things are like the excellence of your

15:49

engineering organization, right? And like developer experience is part of it. It's like, hey, are we giving our developers tools and making it easy to be excellent? But it's like not the entire thing. And I think that's where I really focus on injuring excellence because in some, in some cases, like maybe the developer experience sucks, but like right now it's a trade off I'm willing to make because we need to improve our security or

16:08

something like that. So thinking about like the business outcome first, the part of the injuring process that I want to change to drive that outcome and then thinking about how I'm going to do it. Like that's to me what injuring excellence is. It's like our practices in service of business outcomes. Well, thanks for outlining such a great, you know, the way you pitch it like from outcomes to pillars. So I think at the end of the day, it's all about business outcomes, right?

16:30

Engineering is just one organization within the company. Obviously, engineering needs to also drive business outcomes, right? So in many various companies, I believe there are many challenges. I think what you mentioned, right, time to market, right, innovation, right, bringing some new features, maybe even new things into your system as fast as you can. And then the second one is about reliability and quality, right? Bugs, incidents and maybe security issues and things like

16:53

that. And so the third thing is about cost efficiency, which I think in many companies these days, they're kind of like looking back at the cost, how much they spend and try to, you know, optimize a little bit. So I think every company might go through this journey differently as well. Like what you mentioned, maybe startups prioritize something different than their big organizations. So maybe if we can dive slightly

⁠¶ The Leader's Role in Engineering Excellence

17:12

deeper into all this, right? So obviously, there are many dimensions to all of this, right? One of the most important thing in any engineering excellence is about leadership. Maybe I'll start from there, because most likely you can't do anything if the leadership doesn't support all these things, right? So how important the role of the leaders? Do you think an organization such that the injuring team can

17:31

achieve this excellence? I think it is extremely, extremely important and especially because if every team that is working on injuring excellence initiatives does not know how their work ties to a business outcome, it doesn't go anywhere. But I'll give you a concrete example of this. A lot of organizations in the last two years have spent a lot of time investing in platform engineering. And platform engineering should in theory enable quite a few outcomes, right?

18:00

It should help you go to market faster. Like if you build up the right platform, your developers are more autonomous, but they're also following your best practices. Your platform is secure, it's modernized. So you're, you're being efficient, you're using your resources effectively. So a face value platform engineering should be a great way to drive injuring excellence.

18:19

But platform engineering, especially large organizations sometimes can become like this kind of corner of the organization where it's not really tied to leadership or the leadership is not able to articulate to that team like how their work ties into the broader business outcome. And I think it's really important for an injury leader to say, hey, this quarter or this year or this half or whatever, the thing that we really care about is business

18:44

outcome. Actually we want to move to market faster or we want to improve security or whatever that is. And interrogating kind of each set of initiatives as to like how exactly does this initiative tie back to that? And can I then tell that story to the rest of the organization? So like, hey, our platform engineering team is going to be headed down for three months.

19:02

You're not going to see anything, but it's because we are building out a, you know, the very first slice of our organization, very first slice of that product that is going to improve security of the way we provision infrastructure. OK, I understand how it connects back to that. As an engineering leader also challenge your teams to steal

19:19

thread. So rather than saying we're gonna build a whole platform or we're gonna deal with this whole thing and then like solve the business outcome, it's like, no, there's like choose a business outcome. What is the number one thing blocking that today? And what is the bare minimum slice across the entire stack that I can deliver to move that needle just a little bit for

19:35

right. And so it's like when I think about reliability, if, you know, MTTR is top of mind and see, my CEO is saying like, hey, we have too many incidents. Like we're paying too much in SLA fees. We need to improve our incident costs, OK? We're, I'm not gonna like go and rebuild all of our services and like, you know, fix all of our tech that all at once. It's like, you know, what is the number one thing that's stopping us? OK, it's you know, we're spending too much time fighting fires.

19:57

OK, why are we spending too much time finding fires? Well, there's like 10 reasons. Like the number one reason is like. We don't know which teams are in the critical path for these key outages. And so we spend 30 minutes per incident getting the right people in the room. After that, there's still a lot of other problems, but like we spend 30 minutes there. OK, like let's take that one level firmly. What is the next thing we can do to standardize that and so on.

20:16

And so it's like now up and down the entire stack, you have a single thread. And so as an engineering leader, it's really important to maintain that focus and like challenge the teams. It's like, do we have to do all this? Like what is the bare minimum thing? Like if a product leader is doing that for the product, engineering excellence is like the engineering teams pruned in some sense.

20:33

It's like the thing, the practices and the tools and the stuff we're building for those engineering and business outcomes. And so like we should have that same product management mindset of like, what is the minimum viable or minimum lovable, like process or, you know, tool that we can build for this particular issue.

20:48

And so I think being able to carve out small slices of value that drive to business outcomes by telling that story more broadly across our organization, making sure everyone's bought into the business outcome and holding teams accountable to not building, not just doing science experiments all the time, but like driving towards something very clear means that the next time you want to do one of those things your CE OS bought into,

21:08

it's like, hey, this investment in like this random engineering process resulted in us moving the business outcome. So the next time as an engineering organization, we want to go and say, hey, we need to fix this tech debt. You've earned those stripes. You've said, hey, the last time we made this argument, we moved the business needle. We're not lying to you. Like we have another thing we want to do. Like you've bought that, you

21:26

created the feedback loop. And so as an engineering leader, it's your duty to like bring those business outcomes from the business to engineering and then use the same business outcome to tell the story back to the business of like, hey, here's how our investments impacted the business. And so as an engineering leader, you're gonna be thinking about both sides of that equation. Well, I think that's a really great tips, right. Again, it all comes back to the

21:46

business outcomes. First try to tie to maybe like one single business outcome and identify the initiative that you want to do. And I think I like that you mentioned create a slice, right? You know, maybe people call it thin slice in product management, right? So the same thing you can do in your technology stack. And platform engineering is also like the most common examples I think these days for engineering team to try to improve things.

22:07

But obviously introducing technology itself might not be sufficient, right? Because I think many people try to use technology as the solution for improving their engineering excellence and cannot convey that to the stakeholders. I think this is one of the challenges simply because maybe there are gaps in technical understanding versus the business stakeholders who are non-technical. And the other thing is about building the story around what kind of outcomes that you want

22:30

to build, right? So for people who probably still

⁠¶ Aligning Engineering Excellence with the Business Outcomes

22:33

rely on kind of like implementing tools, be it, I don't know like cloud, AI, platform engineering, whatever that is, right? Maybe also sometimes re architecturing certain things to improve the engineering excellence. What will be your advice for those kind of people right before they approach these kind of thing with such new technologies and new architecture? Is there anything that they can do before they embark on that journey? Yeah. I think it's understanding what

22:56

the North Star is like. We should understand either what metric we're trying to move within our organization and which business metric we're trying to move, or even if it's not an immediately move, a metric like the hypothesis as to like the series of things that will eventually move some specific needle. And it doesn't have to be a

23:14

business outcome necessarily. Like if you are are on kind of a hands on team, you know, insecurity or something like that, buying a new tool, you can still tie it back to one of those pillars, right? It's like, you know, hey, I'm improving our security, improving our reliability, I'm doing these things. And like your agent leader should then be able to tell that story to the business, but like, as long as you can tie it back to one of these pillars.

23:34

And so the way I think about it, for example, is going back to the platform engineering example, let's say you're a platform engineer and you want to spend two months building out a self starter kit for new services. On it's face, it's not a particularly interesting project

23:48

to the business, right? It's like, hey, I'm trying to invest in Terraform. I'm investing in, you know, like I am policy tools and like I'm doing, I'm buying a bunch of stuff and like setting all these things and I'm spending all this time and energy, but why does it matter? It's like, OK, well actually the reason all that stuff matters is this. And this is where like the measuring the current state of the world is really important.

24:09

Today, it takes us 1 1/2 weeks to spin up a new service and go through all the approvals. It takes us four days for security review. Once it's ready to go to production, it takes us four days for SRU review, and then finally it goes to production. So on like either sides of the equation, we're spending, you know, minimum of two weeks outside of like the actual writing of code. And so therefore, if I can reduce those two weeks down to days or hours, then like the ROI is there.

24:34

So by investing in a self starter kit, you know, through the platform measuring team and buying all the right tools for it over the 50 services that we create per quarter, you know, across the organization times a reduction of two weeks, we've saved X dollars of money and we've improved our security posture because now every team is by default registered with our vulnerability management tool, the moment to create a new

24:57

service. And so the likelihood of a breach is much lower by, you know, some amount. So like, now I've made a case for these tools from a efficiency perspective and a security perspective. And so it's not just like, oh, I'm like creating a cool, like experience for my developers. It's like, does the CEO really care about that? Like, maybe, you know, it's like, yeah, I want my developers to be happy and like, and there's a retention aspect and all those things about like

25:19

sentiment. But like, fundamentally, if you tell me like, hey, not only are my developers going to be happy, I'm saving us X weeks per service. I'm increasing our security posture and like our incident rates should go down. That's a much easier way to make that case. And the other thing like you mentioned, like coding assistance is the same thing. It's like, OK, well, can we measure the impact of that?

25:37

That's an interesting one. Actually, though, coding assistance, just kind of a side tangent here is like I had a customer recently tell me it's like, hey, why are you making such a big fuss about, you know, measuring the impact of 50 bucks a month or whatever? You wouldn't ask like, does my developer really need like 16 gigs of RAM, like versus 8? Or like do they really need Intellij? Like can't they just use them? It's obvious that it's one of the tools in their tool kit, right?

26:01

It's like you wouldn't question a developer who's like trying to get an Intellij or something like that. So why are we making a big fuss about spending 50 bucks for a tool? And it's just part of your general tool kit. It's like laptop IDE, terminal coding assistant, like it's part of our general tool kit, right? So it's like, it's an interesting idea. So the same thing should hold true for your injuring

26:19

excellence platform. That's another topic as well as like, you know, why are more engineering leaders not thinking about injuring excellence? But yeah, I kind of a side tangent there, but hope I answered the question.

⁠¶ The Importance of Metrics in Engineering Excellence

26:30

Yeah. I think that's a great example that you outlined just now, right? So I think one thing that I picked from your explanations also in your career journey, right? So engineering also needs to be able to measure things, right? Be it in the maybe software telemetry or even like their, I don't know, velocity metrics that you mentioned or those kind of measurements that I think must be in place, right, for us

26:50

to build engineering excellence. How important for leaders to actually build an initiative to actually gather these kind of measurements? Because sometimes, sometimes it's easy to you can come out from a systems, right? But sometimes, especially when you touch multiple running elements, right? So it's very hard to actually

27:06

quantify the metrics. So how important do you think for engineering organization to actually put an effort, invest time and actually come up with metrics and maybe do like a good cadence around, you know, looking back at the metrics, how to improve them and things like that? So maybe any tips on this? Yeah. Yeah, absolutely. That's a very, very important question. So I think it is a must and I'll

27:28

explain why. Going back to the sales analogy again, like imagine if a sales leader said like I don't need Salesforce, I don't need a CRM, we're just gonna sell and hope for the best. Like you would fire that sales leader tomorrow. Like any sales leader that doesn't have a CRM that's not using that data to make judgments about the organization and where to invest is going to be laughed out of the room, right? Any data leader that doesn't have data tooling is going to be

27:51

laughed out of the room. So why is it that engineering leaders are not treated the same way? Like why is engineering leaders do we not say, of course I need a system of record like every single other function has a system of record, right? Like sales teams of CRMS, marketing teams are marketing automation, finance has ERPIT has CMDBS like ITSM tools at service. Now every single function has a system of record, But what does engineering have? We're just like caution to the

28:16

wind, like do whatever you want. Like every team just out there like it's like the wild, Wild West and it's really insane if you think about it like, you know, like every other function is treated so differently and engineering is the most expensive organization. We should be much more systematic and data-driven and methodical about the way we build our engine organizations. And what I would challenge though, is like, it's not just about the metrics.

28:37

Like imagine if all Salesforce said was like fancy reports, right? Like just like a funnel dashboard or something like you wouldn't need that either. What you need is like an end to end platform that helps you understand like your software ecosystem. So it's like accountability and given a service, which team is accountable for it? Where's it deployed? Where's it running? You know, how many vulnerabilities does it have? What is the code coverage? Like what is the on call rotation?

29:01

Like where can I collect this data? How do I use that data to drive best practices? So like in, for example, in Salesforce, again, I can use the CRM to drive practices in sales. It's like, hey, before you start Apoc with a customer, you need to fill out all these fields. You need to make sure you understand if they have budget, you need to make sure they understand who's gonna buy it. And like the AE has to go and fill this information.

29:24

The idea is that you're using that data to then drive some sort of behavior like in the daily workflow of an AE, right? And so you're using it to drive practices or you're trying to use marketing automation tools for practices. And so it's not just about the dashboards. And so like the metric side of the equation for engineering teams should like a, having a way to store all this data in a meaningful way because it can just be metrics.

29:45

It needs to be like, what is the actual like data structure on which I'm reporting on? So like to us, the way we think about it is like the service catalog, the infrastructure catalog, a catalog of all this information, right? Like just like ACRM, the second part is the ability to like drive behavior around it. So like defining this is what good looks like. This is where I need teams to be. Can we create a shared language as to what good looks like? Because I can say your repo is

30:08

30 vulnerabilities, OK? Like is that good? Is it bad? Like it doesn't matter if it's high or like low vulnerabilities. Like what does that actually mean? Like it doesn't mean anything for me to tell you have 30 vulnerabilities, right? There needs to be some distinction for our organization. Like for me, for us as a company, what is good look like OK, for a financial services company, 0 critical

30:29

vulnerabilities is great. Zero critical vulnerabilities with, you know, resolving them within seven days is excellent, right? And like 0 critical vulnerabilities, resolving them within seven days and you're resolving medium vulnerabilities within 60 days is like top of the blind. Like that's where we want to aspire to. So again, now not only do we have a metric, we have like a graduated window of like how to get better. So like the platform should be

30:52

able to tell you that. And the final bit is reporting. It's an engineering metrics. It's, and it's like, OK, like based on the data that we have, based on the practices of what we're following, are we seeing the impact on the data? So like cycle time and, you know, dura metrics and things like that, those things can't live in isolation, right? It's like you can go and tell a developer improve your MTTR. It's like, OK, like I don't know where to start, right?

31:13

So it's like, instead it's like, hey, we're using MTTR as a measurement. And so we're gonna see, we believe that in order to improve MTTR, we need to change these five practices. So we're gonna run change the five practices. We're gonna come back and see the MTTR change. So it's the engineering metrics are not about measuring productivity or like, you know, like checking if you're like taking advantage of your teams. It's about hypothesis, hypothesis generation and

31:37

hypothesis testing, right? It's like based on the metrics, I have a hypothesis that like we can, it's all about like continuous improvement, right? It's one of the pillars of injuring excellence. We consider it's like continuous improvement. It's not like good versus bad. It's, I'm better than I was yesterday, but that's a very important part of injuring excellence. And so like injuring metrics should be used to say, today our cycle times X, we want to move faster.

32:00

The time to 1st review is really long. Our hypothesis is that the CI build time is taking up, you know, good chunk of that effort by reducing our CI build time, our cycle time will go down, which means that our throughput will go up. That's the hypothesis. And then you go and you say, OK, well, how do we, you know,

32:16

change our build time? Well, then you run an initiative and this is where of course it's gonna help is like, OK, we're going to migrate our teams from, you know, legacy Jenkins servers to get up actions. So we have better caching. But like in a big organization right now without a toilet cortex, not to sales plug, but like it takes a lot of time to

32:31

track a migration like that. And so having the right tooling in place to like track a migration like that and like make sure everyone's doing that migration. OK, you do that migration and then you go back to your metric and say they didn't move the needle. So the energy metrics are a way to like create those hypotheses and test them. But if you're not doing that, how do you know if your efforts are actually moving the needle, if you're actually making an impact?

32:51

So injuring metrics are very, very important. The data under the hood is very, very important and the ability to set standards and Dr. behavior is also really important. It's like all those things together is what creates the flywheel of a data-driven, high visibility, high trust injury culture that is focused on excellence. Like that's how I think about how metrics fit into the broader ecosystem. Right. I like the angle where you mentioned about continuous improvement, right.

33:15

So obviously excellence is kind of like the North Star division that we want to be. Obviously, I don't think any kind of team can say, hey, I'm excellent, you know, like you can't be excellent forever anyway. So the continuous improvement aspect, I think it's very important and kind of like the motion of how to improve yourself from, you know, wherever you are today and then tomorrow and so on and so forth, right? So I think that's really crucial.

⁠¶ The Culture that Drives Engineering Excellence

33:35

So the other thing that you mentioned is about driving the behaviour. I believe this is touching into the aspect of culture, right? I think in an engineering organization, culture is really important. What kind of culture drive a good engineering excellence, kind of initiatives and those kind of things.

33:50

So maybe if you can elaborate, I know culture is a bit big topic, open-ended, but maybe from your experience, right, what kind of culture do you think drive this kind of engineering excellence? Yeah. So I'll start with kind of a Segway from the last question. So injury metrics, the right culture is one that is transparent and has high

34:09

visibility. And what I mean by that is like injuring metrics are not this like ivory tower thing that the, you know, your VPS and CTOS are looking at and they're like coming down and like smiting the teams based on those metrics. It's like if every team has visibility into your SL OS and your latency and stuff in APM, why can't your teams look at their own metrics and injuring metrics too, right. So it's like visibility up and down the stack.

34:30

So everybody gets access to the metrics, everybody understands this is not a measurement tool. This is a way for us to like find improvements. So that is really important. So a culture around metrics, a culture around visibility, everyone looking at the same data. So that's one part of it. The second part of it is a shared language. So does everyone actually know what good means? If you ask three people on the team what does production readiness look like, and you get

34:53

3 different answers. Immediate fail. So like having a shared language as to what good looks like, having a shared language like This is why, you know, companies do OK Rs and like, you know, North Star goals and North Star metrics and all these things. It's like if everyone at least is thinking of the same thing, right? So if you're like a, a social media company, you probably have a North Star metric. It's like, hey, we want to

35:11

improve retention. Like that's our number one thing where like this year, that's the only thing that matters. OK, well, everybody knows that. Everyone can say like retention is the only thing that matters. I can ask myself, I'm actually doing something about that. So the same thing holds true for injuring process. So does everyone know what good looks like and what that shared definition is? How much can you dumb down that definition?

35:30

So like your definition of good can be shared, but if it's like 60 criteria for like production readiness, it doesn't matter, right? It's like no one's gonna remember all that stuff. But like it's like that confluence page over there. And so like the ability to create like a clear path to greatness I think is really important. It's like, hey, these are the basics. This is like, you know, the non

35:49

negotiables for production. This is what like the gold standard looks like to be able to kind of like buck it up. What good looks like in these categories I think is really important because then you're as a team lead, you can say like, hey, we've all agreed that like these 15 things are like the bare minimum to go to production. We're not doing those things. And so like I can use that now

36:07

as a way to advocate for time. So like I can go to my manager or my VP and be like, hey, we all agree that these 15 things are really important. We're not getting time to meet those things. Like we need time. And so now you've created like that share language. The VPS cannot say like, I mean, yes, they can say no, but the idea is that yes, we have agreed to those things. I can make that trade off very clearly. And so that communication is very important.

36:28

The third thing is repeated messaging. And so how much of the same data can you use in multiple places? So, and I, if we're doing a migration, like I mentioned earlier, talk about it in your all hands, talk about it in your monthly operation license reviews, talk about it in your team meetings and team retros, right? Is there enough data in your, in whatever you're using from tooling standpoint for each team to be able to self report on

36:52

these things, right? It's like This is why teams spend so much time around like, OK, our tracking is like each team can like very quickly iterate. And so it's like the CEO saying something, the CTO saying something and like it trickles down and like, I can check my own work against that. So for engineering practices,

37:08

the same thing. It's like does your team, your manager, your directory VP, they all have a slice of the data where they can their level come in and see like, hey, how are we tracking against those practices? How are we tracking against those things? And so the ability to kind of create that, like instant visibility and repeated communication is really important. This is like a common, like,

37:25

adage in business, right? It's like you have to keep repeating things until you feel like you've repeated yourself so many times that people are sick of you. And at that point, people have finally realized what you're talking about. And so it's the same thing for engineering practices. Like, keep repeating yourself. It's like, hey, congratulate people, celebrate the wins. Like, hey, Ganesha's team has finished 80% of migrating 80% of their services to the new platform.

37:47

You know, congratulations, right? Like do you have data to be able to celebrate those wins? And like, if you say that in an all hands meeting, people are like, oh, like clearly this is important enough that like getting a slide at the all hands, like are we actually tracking towards it and whatnot? So like, can you create that culture? It's about creating momentum. It's about creating visibility. It's about shared language.

38:04

It sounds like cult behavior, but like really like every organization, like culture in some ways is like, how can you create a cult like behaviors with an organization like healthy and like try the right outcomes, of course, but like so that's kind of the framing I like to think about kind of tongue in cheek. Yeah, so I in my experience also the bigger you are, you know, the bigger you grow, right, the more important all these things, right, Especially the repeating

38:25

of the message, right? Because when you have like hundreds of engineers, for example, right, it's very difficult to kind of like align everybody. So a few things that I picked from what you're sharing, right? So high transparency, maybe have the metrics in place that you can see all the time. Shared language. I can also agree to, you know, use this kind of shared language because when you have multiple engineering teams, everyone thinks excellence differently, right?

38:47

So if you don't have a shared language, what good looks like? I think it's very difficult to align everybody. And then, yeah, repeated message, building a clear path. I think it's also important because you can say here's the initiative and let people do it. But if there's no good clear path and you kind of like streamline the process, I think it's very hard for people to achieve that as well.

⁠¶ Platform Engineering and Internal Developer Platform

39:05

So the other aspect that I wanted to dive deeper is about, you know, platform engineering because you are building, you know, one for yourself, right? So the Cortex dot IO. So I think people these days talk about platform engineering, you know, internal developer platform and things like that. So maybe the first question is at what point in, you know, engineering stage do you think everyone should try thinking seriously about having this platform, engineering and

39:28

internal developer platform? The interesting thing, I kind of break up the two aspects of this internal developer platform, internal development portal. There's a whole debate about like what each thing means, like when you need them, like it's the whole thing. But basically, I would argue that do you need to start with data? So every organization should have some sort of data. So like, do you have a system where you can track your software assets, infrastructure assets?

39:49

You have a single place where you can determine accountability and ownership. You can probably get away without any of this stuff, you know, up until about like 30 to 40 engineers. I want to say like roughly speaking at that point, like you have enough kind of clusters in your team graph that having a system where you can say like. The Asia team is working on X, you know, Henry seems working on Y. Like this is their, you know, sphere of accountability. They, they own these different

40:13

repos. Having that in one place is really important. You should do that like very, very early on. And so usually that comes in the form of an internal developer portal, which is like a service catalog or something like that. Around that same time, you probably want to use, you know, ideally your portal already provides this. We're also starting to measure some of the key metrics, so like cycle time, review time and so

40:33

on around 30 to 40 engineers. Anything before that, like it's a little bit of noise, honestly, like you can get some value out of it, but it's not honestly worth it. Ideally your portal gives you this data, but if not like you're getting it in in a manual way, you're tracking some key metrics and like the things at a 30% team, it's probably a 30% like injuring organization, I'm saying is like probably more on

40:55

moving fast. So like cycle time, throughput incidents, keeping the lights on, those kind of metrics are probably pretty important. From there, once you get to around like 60 to 75 engineers is where platform esque investments start to come into play. Now it doesn't have to be a platform engineering team, but some sort of like collective group of like infra cloud staff engineers, like senior engineers and maybe like one or two platform people like kind of coalesce and started to work on

41:28

org wide initiatives. Because at about like 50 to 60 people, you're getting to a point where like practices are starting to diverge. Like people are trying to do things in weird ways. And so starting at that stage I think is really important. However, I will say at that point in like you want to be very, very tactical about the types of investments you're making. So even from a platform machine perspective, like what is the thing that is slowing you down at a 60 percent, 70% engineering

41:52

team? So like it's usually around like deployments, build systems and tooling, infrastructure provisioning, like, you know, are we having repeatability around those things? Like it's not like the quote of a sexy platform engineering stuff that people are doing, but it's like tactical things that really matter. It's like if you hit a point where you're like 60 or 70 engineers, you probably have really slow builds or like something like that. You probably have like preview

42:16

bottlenecks. You may have started the company and I'm seeing from experience like click OPS in Google Cloud or ABS, whatever. And so like around 60 engineers, you want to start automating and terraforming or infrastructures, coding, some of that stuff like thinking about the next phase like, yeah, what's going to break when I get to 100 engineers, like start to get out of that.

42:33

So around 60 to 70 engineers, you want to start investing in platform practices, but very tactically against specific initiatives. I think once you get to around like 120 ish engineers, like you have enough independent work streams across the organization that like true platform engineering stuff starts to have a real impact.

42:53

So like a standard set of like deploy pipelines and deploy capabilities where you're standardizing on Helm, where you're standardizing on Terraform or Bloomy or Spacelift or cross lane or Craddicks or whatever kind of platform tool you want. Like those kind of decisions start to become really important because they like compounding

43:10

effects. And so I would say like somewhere between like the 60 to 120 range issues, like when platform engineering starts to become a thing, anything bigger than that, like you absolutely need platform engineering. My hot take though, is that like at whatever point you start bringing in like SRE and security and stuff, you probably also want to start thinking about platform engineering. And I would argue we're starting to see this practice a little

43:33

bit more. But why not have those teams roll into an intern excellence leaders rather than like all of them, like kind of reporting to different places. You want your platform engineering teams to be working with your security teams and your SRE teams, right? As if they all their work depends on each other. Like your SRE team cannot build reliable software if your platform doesn't take that into account, right?

43:54

Or like security cannot do their job if you're like infrastructure as code tooling is not thinking about security from day one. So like, why is platform engineering over here and security over here? Like bring those things closer together. It's an easy way to do that is to like bring them together into an injuring excellence team. So like instead of having a head of developer experience or a head of SRE or whatever, like have a head of injuring excellence and like have these

44:16

orgs roll up into that. So you have like, especially up until like you're like a really large enterprise, at which point each of those things ends up becoming their own. Like VPS, have them roll into a single leader. So that way like all their work is really, really aligned to each other. It makes hiring easier as well. So that's kind of like my heart takes from like an organization

44:34

design perspective. Yeah, I think thanks for sharing such a, you know, good analogy of like how big is the injuring team before you start thinking about certain things? And especially when you have hundreds or even more thousands of engineers, having such kind of a platform capability I think is really important because otherwise people will be doing the same thing with different ways.

44:52

And you cannot rationalize when things going wrong, right, or especially when you want to improve something, you kind of like double the effort and things like that. So definitely when you grow in size, you need this.

⁠¶ The Biggest Misconception of Platform Engineering or IDP

45:02

So maybe from your experience building this internal developer platform, what do you see some biggest misconception people have when they think they want to adopt platform, maybe some kind of platform engineering or IDP, right, versus actually what

45:16

actually IDP is? Yeah. So kind of going back to the data question, one of the biggest misconceptions with portals in particular is do I have to get all my data in order first before I put it into the portal, Like to get any value out of the portal? Like if I want engineering metrics, if I want scorecards and like best practices and all these things, do I have to clean all my data up first before I buy a portal or am I wasting time with a portal? That's the biggest

45:42

misconception. And the reason I say that is because if you don't have a system that can hold data and like tell you what's wrong with the data, how are you ever going to fix it in the first place, right? It's like right now everything is an unknown unknown and you can't, you can be like, I'm going to clean up my data, but like, what are you really doing? And what does that actually mean? And so like using a portal is a way to go from the unknown unknowns to the known unknowns.

46:03

So it's like, OK, well, before we didn't know anything. Now we know, we don't know. And then now we can start to clean up the data and be like, OK, well, now we know about these things. We don't know about these things is you can actually clean up your data using a port. Like that is a whole point of having a system record, right? Like a sales leader when they come in. And I love this analogy. I'm gonna keep using it.

46:20

But like when a sales leader comes into a new sales organization, they're not going to say like, first we're going to take, you know, all of our calendar invites and like, put it into a spreadsheet and then clean it up and then go by Salesforce. It's like, no, we're going to come by Salesforce, dump everything in there. And then we're going to figure out how to like make sense of this data because I need a system that's going to help me see the world. That's the first misconception.

46:38

The second misconception is if you build it, they will come. So like think people think about portal and platform initiatives as like, hey, we're going to build this like really cool stuff from like these great self-serve experiences and like it's going to be so good that people will just come and use it. Completely false. This is like basic product management, you know, product

46:57

thinking, right? It's like it's very hard for some for people to change their behaviors or the people are used to doing things a certain way, especially. And I speak for personal experience as developers, we have our VMR CS, like, you know, we spend so much time like creating our setups and like our practices, right? Like we, that's kind of who we are as an engineer. It's like we'd like to tailor our workflows and our tooling. And so if you're like, oh, there's this other tool, you're

47:17

using a platform. I've been doing these things in other ways. Seems fine to me. Like you go build your platform, I'll do my thing, right? It's like, that's a very common issue we see. And so instead, it's really important to think about starting with the definition of what good looks like, right? So it's like, hey, we're going to start with, and this is how we built our product, starting with scorecards next.

47:34

So it's like, OK, before we go off and build all these self-serve tools and platform engineering things, we're gonna build a scorecard that says like, hey, in order to go to production, these are all the things you should be doing, right? Like these are all the practices you should be following. One of our customers calls it like the Golden State. What is the Golden State of a service? And so if you define that, then your golden path makes a lot more sense.

47:55

It's like, hey, this is what every service needs to do. It's very easy to get people to buy into that, especially because most organizations have a production process already, right? Like 99% organizations of something, it's like a confluence page or some process. It's like, hey, this is what production ready looks like. And so you tell that, you tell your team. It's like, hey, we're not trying to introduce a new process.

48:13

It's nothing new. We're taking that old thing that we used to do, we're automating it and we're just gonna give you tell you exactly where you stand from a pressure ready standpoint. So we're just, we're doing a thing that we already used to do. We're just doing it better. Like, OK, that seems fine. Like I can buy into that. And so now everyone agrees like, OK, well, this is what pressure readiness looks like. Then you tell people, OK, well, you can do it your old way to

48:33

get to production ready. Sure. We built the self starter kit on our platform that like click a button, you got a new service in 5 minutes. It's going to meet all of your requirements within 5 minutes. So do you want to use the self-serve kit or do you want to go to your old thing? One in 20 people will do their old thing, but like they still have to meet all those requirements. So at least you're still getting

48:52

the same outcome. But next time that 20 people will use a new tool because it's like, well, if you're telling me I need to do all these things, that's what good looks like, which I agree with, then I'm just going to use a new tool. So now you've told people the Golden Path and the Golden State, you brought them together. That's an A, our platform is serving a very key purpose. So hold a key bottleneck and you've got that buying from the

49:10

people. So that's another misconception of like if you build it, there will come. It's absolutely not true. That's where we see a lot of IDP initiatives fail. And last but not least, it's very important to tie to a thing that they're already doing. I kind of mentioned this in this answer. You don't want to come and introduce a completely new workflow or a completely new process. Find the thing that the team is doing already and make that better and that will get bind to your platform.

49:34

So a platform doesn't have to be brand new platform doesn't have to be one thing. A platform can be collection of best of breed capabilities, right? Like that's the other things like we're going to deliver a whole platform like no, no, no, like your platform can be infrastructure as code, it can be CICD, it can be all these

49:50

things. And you can like iteratively like you can bounce around like you don't have to go and deliver a whole platform on once and see if you think about incremental value like we talked about earlier, business outcomes, Olving the thing that already exists and making that better, you will naturally say like, OK, we're just going to deliver like the smallest slice of a latform that we can continue iterating on it. So have that product management

50:09

mindset. Those are kind of like the biggest things that we see wrong with a lot of reaching ractices today. Thanks for sharing such a great misconceptions. I think some people might, you know, especially those who have embarked this journey, right? I'm, I'm sure many people can relate to this, especially the

50:22

platform as a product thing. I think still many people maybe build a platform and think people just use it or maybe worse, they kind of like force it to teams without actually, you know, explaining the reason why and you know, how it benefits them and things like that.

⁠¶ Cortex as an Engineering Excellence Platform

50:36

So obviously Cortex is one way of doing this. So maybe a little bit of plug about Cortex, how do you think people can use Cortex to help all these things? Maybe some features? I know that you have this concept of called engineering intelligence. So if you tell us a little bit more, how can you use Cortex to actually solve all these kind of things? Yeah, absolutely. So Cortex is an internal developer portal, but we like to think about it as an engineering

50:59

excellence platform. At the end of the day, it starts with the catalog. So we help you catalog all of your services, your infrastructure, your teams, and then automatically determine ownership for them. So within minutes we have an ML model that can go through all of your repos and automatically assigned teams that we believe owns each repo. So within, you know, 10 minutes of setting up Cortex, you have a list of all your repos and which teams are accountable. For each of those.

51:23

We have a product called Scorecard which allows you to define best practices and standards. So like for example, automating production readiness, tracking migrations and audits, package upgrades, security standards and what not. So being able to define as best practices and standards. And then from there we have a tool called work flows. And so workflows is basically kind of like a wizard builder where you can build experiences

51:44

for developers. So like spinning up a new service deploys, you know, infrastructure provisioning, just in time, credentials access and so on. So we have a really, really easy and powerful way to build those self server experiences, including code scaffolding and project bootstrapping. And then finally we have an engineering intelligence product which is around engineering metrics. So like to our metrics, cycle time metrics, MTTR, things like that.

52:04

And the idea is that you should get all of these things in a single platform. So you shouldn't have to go to reporting in one place and self-service in one place. And like cataloging in one place, we bring all those things to the same platform because at the end of the day, those things

52:16

are a flywheel. So measure the metrics, figure out your practices, try them using scorecards, you know, use the catalog data that like underpins all this stuff and build self-service experiences that make those things better. So at the end of the day, that's what Cortex is. We built a lot of native integration. There's a lot of automation into it. So, you know, it usually takes very, very little time to get started with it to get those metrics and catalog up to date.

⁠¶ Generative AI Use Case in Platform Engineering

52:39

Sounds like a really great tools, especially if you are like big engineering organizations. So you mentioned something about embedding AI model, right. So I think obviously the biggest thing these days people talk about, you know, AI, generative AI. So what do you think are some cool cases that you have seen maybe in your experience as well using generative AI and internal developer platform or platform engineering, any kind of like cool use cases that you can share today?

53:03

Yeah, that's a really interesting question. So there's some stuff that we're working on that I can share just yet, but I'll kind of, you know, give you like a high little idea of how we think about generative AI and like how it applies to the space. And then some of the cool use cases we've seen with it as well.

53:19

You know, we have been working with, you know, one of the BIG4 consulting firms and it's really interesting the way they use generative AI. You know, as they're doing transformation projects and modernization projects, you know, they're able to use generative AI to kind of rapidly accelerate the code transformations, the migrations

53:38

and things like that. And so rather than going in with a ton of manual effort, they build like a really powerful tool chain around platform engineering to help with the adoption of their platform. So it's not like the platform isn't about Gen. AI itself, but it's about, hey, we know this platform is going to help, you know, drive these different use cases and user journeys. Can't we use generative AI to like migrate existing things in the new world and stuff like that?

53:58

So that was really interesting use case. Another thing that I think is really important, and I call this example out, is I think a lot of tools have gone down the path of just these like generic chat bots. Like, hey, we have all this data, let's just throw a chat bot on it with some rag. Like, OK, like it's not it's cool. It's like a nice demo, but I guess it doesn't actually do anything.

54:17

And so I think like, you know, from our perspective, the application of generative AI is really going to shine when it applies to specific use cases, specific types of data. Like that's, I think that's one way to think about it. So like internally, you know, if you're on the SRE team, we've seen folks use generative AI for like, you know, analyzing their

54:34

incidents and things like that. If you are on the platform measuring team, using generative AI to summarize your own feature backlog or like requests from your end users internally. So can you find patterns of like feature requests and things like that? That's been a really interesting

54:49

use case. It's like if you had like a junior analyst on your team that you could like give a problem, be like, hey, can you go and like figure out the synthesis and like, here's how you should think about it. Those are kinds of things that it's really, really good at. Obviously, like, you know, coding assistance and the like, but like, you know that that's fairly obvious from a platform

55:04

engineering perspective. It's about getting another brain on your team that can help you reason about like, are we working on the right things? Are we finding the right patterns and so on. So that's kind of how we've seen some really interesting use cases. Right. I don't know what you're building, but I'm guessing that if you have some kind of assistant within cortex, right, that you can ask some questions and find patterns. I think that would be cool.

55:23

Definitely. Maybe that's one use case of generative. Yeah.

⁠¶ 3 Tech Lead Wisdom

55:26

So Dinesh, it's been a pleasant conversation. So I have a tradition in my podcast to ask one last question for all my guests. I call this the three technical leadership wisdom. You can think of them just like an advice that you want to give to the listeners. Maybe if you can share your version today that would be great. Yeah, absolutely. My first one, then this little one I strong by very much is be

55:47

a good storyteller. It doesn't matter if you're an engineer, it doesn't matter if you're an engineering leader. Be a great storyteller. A lot of what you do in any role is telling stories, whether it's giving one of your reports advice. If you're telling the story, it's like, hey, imagine if we improve these things? Like how much better could it be? Like what kind of behaviour do we want to see? Like, how do I help my reports see the light?

56:11

You're telling a story, right? You're managing up, you're flying risk to your manager about some project. You're telling a story, right? It's not about like making up excuses or fabricating a story. It's about like, can I communicate beginning, middle and can I be concise? Can I give clear information? Can I bring people with me in my thought process? So being able to tell stories very important, especially as an engineering leader, you're

56:34

making a big decision. How do you help the team see that? How do you, like, explain to them the trade-offs, the thought that went into this, why we're doing a certain thing? What does it mean for the business? Like tell a story.

56:44

It's really, really important. The second thing I'll say is deeply understand your product, especially as an engineering leader, having a above average understanding of the product that you're working on will help you make very quick judgements and reason about the space that you're operating in, right? Like you of course will lean on your own team and the experts on your team for, you know, tactical advice and like they're the experts at the end of the day.

57:11

But if you have a deep understanding of your product, then you are able to reason about trade-offs and route information much more quickly. So I think deeply understanding your product is the second thing that I would say. And I would say probably the third thing is think, and this kind of goes back to the early part of our conversation, Think

57:33

about the culture. I know that sounds like a very vague thing to say, but the specific advice I want to give is to think about your operational cadence in particular. So the rituals and the practices that you set up help dictate the way your organization operates, especially as it's larger, right? So like, for example, if you create an operational excellence review, that itself starts to create the conversation of like, what metrics should we be looking at?

58:01

How often should we be looking at these metrics? Who should be involved? You know, are we bringing the right people? Just the fact that you have an operational excellence review creates this conversation, right? But then go on a little bit more granular. OK, In the operational excellence review, what outcomes do I wanna do I care about? And so like, how am I gonna structure the meeting agenda to

58:18

drive those outcomes, right? Like, imagine just the fact that you say like, hey, we're gonna stop our conversation 10 minutes before the end of the meeting and with the only thing we're gonna be talking the last 10 minutes are action items, right? It completely changes the format of the meeting, right? You're like, you were creating culture that like it's not just to talk about things, we want to do something about it, right?

58:36

And so it's like, how can you structure your operational cadence, your rituals to create the culture that you want? You know, it's like if you want to celebrate shipping and like, hey, like we want to create momentum, create a weekly demo meeting for your engineering team, right? It's like no matter what you built, you have to come and demo. Like you just have to demo something. Even if it's an API, come show us your code. It doesn't matter. Like show your team what you're working on.

58:57

Show us, show the interesting things we're doing across the organization. That's a ritual, right? Those rituals can also be part of other teams as well. So like maybe a ritual could be your team, like an injuring team and the support team get together for hackathon once 1/4. So it's like, OK, well what I want to hear is like cross pollination of information that's going to reduce the support load and escalations from support to engineering. Like that's the thing that I'm thinking about.

59:24

But like I'm creating a ritual that like creates that natural cross pollination between those two teams. Like what are the actual rituals that I can do on a recurring basis that even if I'm gone, if I'm on PTO or whatever, like those systems are in place that the wheels are turning and it is creating a culture where when I'm not in the room that the behaviors are still happening the way I think about it. So that would be my third thing is like think about culture from

59:44

the lens of operational cadence. So those would be my 3. I really love the last one, right? Because any times when you are, you know, into the mode of like operations production and all that, right? So you think about just the problems solving, right? But not necessarily the culture that you want to build and creating a cadence, be intentional, right? Creating a cadence. Maybe the rituals, you know, the meetings that you want to have, right?

01:00:05

The cross pollination thing. I think that's really important for any engineering leader. So Ganesh, if people love this conversation, they want to reach out to you, ask more questions or maybe find out more about your product and things that you built. Is there a place where they can find you online? I'm on LinkedIn to enter my name and you can also e-mail me at Ganesh cortex dot IO. All right, lovely. So thank you so much for today's

01:00:28

conversation. So I believe people have learned a lot of things about building engineering excellence and hopefully they can achieve that. So thanks again for that. Thank you for having me enjoy the conversation.

Transcript source: Provided by creator in RSS feed: download file

#225 - Driving Engineering Excellence with Platform Engineering and IDP - Ganesh Datta

Episode description

Transcript