Story: Leaving LinkedIn - Choosing Engineering Excellence Over Expediency | CoRecursive: Coding Stories podcast

00:00

Hello and welcome to CoRecursive, I'm Adam Gordon Bell. Imagine that you're at the forefront of shaping the front end for the world's largest professional social network with like a billion users linked in. Only to find yourself at a crossroads where your core values are just clashing with the demands of the job. Today we're diving into the story of a senior staff engineer at LinkedIn who faced a real dilemma. And if you're a loyal listener you might recognize the guest.

00:41

There we go. Okay, those are running now. Crap, did I script my ability to hear you? Hello, hello, hello, I did, hang it, hang on, hello, hello, yeah, yeah, we're good now. Okay, let's try that again. That's Chris Craycho and he was on the podcast some time ago talking about TypeScript. Today though he's talking about his time at LinkedIn. I loved my time at LinkedIn. It was great. And I'm really excited for where it takes me next because I couldn't do whatever it is I do next without

01:13

that, including the bumpy parts, but also the really bright parts. You know, you could come away from that who difficult ending to the story thinking, wow, LinkedIn really sucked for Chris, but no, actually in the main it was really great. And I'm really glad of it. And even that bumpy ending, I'm glad of the lessons I learned. And looking forward to where it takes me next. So yeah, about that bumpy ending. Chris worked out LinkedIn in a pretty important and senior role

01:43

for close to five years. And he oversaw a bunch of important projects in the desktop app, what you see when you go to LinkedIn.com and a non-mobile browser that he worked on that. He led projects like modernizing just massive loads of JavaScript, but let's cut to the chase. Chris quit LinkedIn and he quit out of frustration. That's the story we're going to unpack today. What it's like to have a senior technical role at a place with thousands of devs,

02:10

right? How do you get big initiatives coordinating across many teams? How do you get things like that done? How do you lead big projects? How do you improve a massive codebase? Also, how do you balance sustainable development practices? With a business need for speed, what if this thirst for velocity and speed eviteration clashes with your approach and your values and how you think things should be done?

02:39

How do you handle that conflict? And spoiler alert, it ends with Chris walking away from the job. That's what we're talking about today. But yeah, it all started in Sunnyvale. It was almost exactly five years ago now. I started the last days of January 2019, which feels like a lot more than five years ago. Thanks, COVID. But when I first got there, when you joined a big company or good and healthy small companies, there's usually some variety

03:13

of onboarding. So it's been a lot of time going through the onboarding classes, learning how all the different layers of the stack worked and all of that. Chris was going to be working remotely from Colorado once he got going. But the first two weeks were onboarding. And the second of those two weeks was spending time with his team. Yeah, sitting next to and chatting with people on my team, my manager and a couple of the folks I worked with. Even then, that team was a bit more

03:39

distributed than most. So we had a couple of people who worked in that little corner of cubicle land in the Sunnyvale office. And we had a couple of people who lived in a corner of cubicle land in the SF office. And then we had one person working from Alaska. And then we had me actually working from Colorado remotely, which was very unusual for LinkedIn at the time. We had, I think, under a hundred engineers remote, one I joined and thousands of engineers total.

04:12

For Chris, working at such a place was a big change. The startup I'd been coming from had a relatively standard, relatively well-factored monolith back end and then, you know, database. And I hand full of services here and there. And I got to LinkedIn and discovered, okay, number one, there are more people working on the front end client app than are employed by my previous employer. And there are more lines of code in this client app than exist at my previous employer

04:44

in total. And there are thousands of services running in the back end. And these monstrous API servers, which absolutely dwarf anything we had at my previous employer. But that's just the API server for one. And this is just the client you think of as LinkedIn.com. And oh, by the way, we also have our ad selling platform. And oh, by the way, we also have LinkedIn learning.

05:09

And oh, by the way, we also, and just the list goes on and on. And so one of the really strong striking experiences was just repeatedly over and over again saying, I'm sorry, did you say insert number here? My previous company had what I thought at the time was a pretty decent size to app like 150,000 lines of code. And the LinkedIn front end, one I got there had two million. I was just like, did you say million? That's that's a lot. That's 20 times almost the size of my

05:43

product. How do you even build that? How long to build? Oh, 17 minutes for a new build. That's that's not bad. We should probably make that better. But that's not bad. Chris's team was called the infrastructure team. But this didn't mean standing up servers. It was more like engineering enablement or developer experience. Chris's job was to make the front end of LinkedIn's massive desktop app easier to work on. And that's well, well, that's a lot.

06:10

What does it mean to be serving? And to some extent, helping lead, somewhere between 150 to 200 engineers committing to this app every quarter, shipping more and more and more lines of code. Just how do you do anything at that scale where there are dozens of teams, hundreds of engineers trying to ship one cohesive product, just kind of overwhelming at first the scale of all of that. Exciting also to be very clear. It wasn't like I was

06:41

like, oh, no, it was more like, this is cool. But oh my gosh, there's how do we do what? And also then you know, you think about tech debt or tech success, either one. But at that scale, all of those successes and problems are very much magnified. So you're like, oh, we have two million lines worth of tech debt. Whatever your tech debt problems are, it's multiplied by that rather than, you know, whatever your baseline is. And that's so lot. One of the first big changes that Chris helped with

07:13

was introducing JavaScript classes. JavaScript had introduced classes, but they were using Ember framework and their code needed some careful updating. So how do we get two million lines of code to change how they author classes from, you know, pass an object literal into a function to, okay, now we're going to use class, food extends subclass bar or superclass bar.

07:39

So that kind of thing was like, okay, we have code mods, but they're not 100% and there are complicated things about like, okay, if you take a native class from JavaScript and use it as a subclass of an old school Ember class, and then that is a subclass of a native class. So it's kind

08:01

of striped. We came up with the fun name zebra striping for that of like, if your classes look like a series of zebra stripes where some of them are white and some of them are black, think of that as, you know, old school classes versus modern JavaScript classes all the way along down,

08:15

they were interrupt bugs because the old system was never designed to work with this and that gets it maybe one of the biggest pieces that I learned there is for a migration to work at that kind of scale, it has to be as automatable as possible because otherwise it'll, it'll just never get done because the amount of work it would take to manually rewrite two million lines of code.

08:40

It's just going to take months and months of work, even if you could work on it full time. And it's totally unreasonable to ask a product team to stop work full time to go get some nice new syntax for their JavaScript, even if it comes with performance benefits, even if it comes from end user experience benefits. All right, the product teams, right? Chris's team works on ways to improve developer experience, but they need everyone else on board. They need the product teams

09:05

that are busy shipping features to help. LinkedIn has a number of relatively well-baked processes, always undergoing some degree of revision, but a number of well-baked processes for wrangling large technical investments. And that dates back to when LinkedIn hit a scaling cliff on its monolith in the early 2010s and just had to go into a service oriented architecture, early microservices mode because it just, it could not scale a single monolith anymore at the point

09:40

it was at. And that was a tenth of the size that the user base is now and so on. So they basically had to stop work for a year. And they looked at that and, you know, I say stop work, stop product work. And they said, we're never going to do this again. It's too expensive as a business to stop all work for a year. And so we had a system for horizontal initiatives like that. And that's what they were called were horizontal meeting kind of cutting across large numbers of teams.

10:09

And you had to pitch them. There's a committee that reviews them for the whole company and keeps the level of engagement there at a, you know, 10% of these teams commitment or lower so that you can just say we're not, we're not going to spend 100% of all of the product teams working on the flagship app on your technical initiatives. Keeping it under 10% was tough. So the plan was to use a lot of automation tooling to speed things up. But first they needed a green light from a senior

10:37

VP. So they crafted a pitch. So try to fit all of the wins and trade-offs and how that makes business value sense and what the actual, what the technical war outcomes are and what the then business outcomes are from those technical outcomes in one page. As an aside, LinkedIn had a hilarious culture of one pages that were like three to six pages long. Are you saying you make a tool and then the specific teams kind of run the tool with the human and loop and make sure like

11:06

this is yeah. Exactly right. Now we also learned from some of those early phases that if if we didn't have to get those specific teams to do it and we could do it for them, that was also better because it's a lot easier for a team to say yes I will review your PRs and do a smoke test to make sure everything's good. Then it is to get them to sign up to say I will take a, you know, a half a week or a week of my product shipping time to do these code mods. If instead I can just say

11:32

hey I've got this automation I'm going to run it on your code base. Here's some training so you know what the output is going to look like and how to work with it and can you merge my PRs much easier much easier to get buy in from them and especially their management then saying hey can you do some work for us please. It took us 18 months and to end to do the Ember stuff. Most of it was in a six month period but you always have a long tail of teams that

11:59

are going to have to do it. Sorry we have to delay this for two quarters and another team that gets hit by well we would like to fund this but actually the CEO himself just vetoed our attempted roll out of a new product design and we're going to have to rebuild it from scratch and sorry when you go up against the CEO you lose so you're taking a initiative it's important for you to agree but no you can't have that or the CEO wins. After that project success

12:27

Chris and his team had a clear next target. It was the flood of errors being generated in the front end. Like most companies LinkedIn has error logging for those kinds of things most you know your startup probably uses a ray gun or something like that to ingest those. LinkedIn has its own internal infrastructure because the prices for the levels of errors we were filing every every hour would be exorbitant. That's wild. I guess there's like an implication that there's a lot of like maybe

12:59

secondary code paths that are just broken and that's surprising to me. I mean I guess my snarky take is it shouldn't be it really shouldn't be it's like I mean to make it a little less snarky and to really speak earnestly about this when I look at software a lot of software is broken in those ways and it's not generally due to a lack of care on the part of engineers it's usually

13:30

down to a combination of structural factors of at least two sorts. Sort one is what I think many of us are familiar with just in the form of business pressures of look we've got to ship this thing and in a lot of ways that's a really good forcing function a motivated engineers are often

13:50

inclined to polishing for too long and and sometimes we just need to get the thing out into the world and that's again not necessarily bad but it can mean that we cut corners and we compromise on quality and without a very strong and robust engineering culture that understands yes that's a

14:11

valuable tradeoff to make but no we can't make it all the time or we'll end up with ultimately broken janky user experiences that in the short term may not hurt us but in the long term we'll actually hurt us as a business even when you're doing your best you end up with code that is

14:29

often subtly broken so your smoke test might not catch it your attempt at QA might not catch it it might be in an obscure path but when you've got LinkedIn cross the billion members count sometime this past year well a billion members touching what by the time I left was 3.2 million

14:49

lines of code in this monorepo half of which is tests and half of which is production code but like it doesn't matter how hard you have tried QA wise there's just going to be stuff that somebody thinks to do that no one ever thought of that particular pathway in combination of stuff to test

15:08

and so you end up with something that's broken and maybe the page hangs maybe it white screens on you you know somebody reloads it or in an app on a mobile device they just kill the app and restart it and try to do it again and well we all know that turn it off and turn it on again fixes a lot of

15:25

problems because you get in out of those weird state bugs but there was a solution though if you took those issues one by one and just imagined that instead you had been using TypeScript so one of the things we did is go and look at okay what's our actual count of JavaScript errors

15:44

that we know would be caught by this there are some that you're never going to catch TypeScript is complicated JavaScript is complicated but there were classes of errors that we could say look if we actually do the whole migration how many of these things will stop going to our

15:58

logging infrastructure and when you're looking at a number that's in the millions per day you start to actually be able to talk real numbers of hey we could cut our logging volume from the application down by at least a quarter just get rid of 25% the numbers probably higher than

16:17

that that's a floor so Chris starts putting together a one-page on moving to TypeScript the good news is migrating to TypeScript can be done incrementally lots of code bases have done it before the bad news is there's no way it can fit in a 10% time budget it's just too large of a task

16:37

but Chris's document was convincing TypeScript used to have this slogan JavaScript that scales and with those millions of lines of code with all of those errors and going through them it was pretty clear that LinkedIn needed those scaling benefits and so Chris

16:52

skipped the horizontal initiatives he he sort of went grassroots went bottom up engineering managers would ask and engineers on their team okay you're advocating that we be early adopters for this or that we make a significant investment here why that document would just get handed to

17:09

them and so it became a really useful tool to just have the information available in a relatively concise format that said here the problems it solves here the wins it gives us here's how it stacks up even our hiring ability to compete with our our peers who are trying to hire and then

17:28

people could fit that into their mental box of here's what the tool is here's what it gives us here's how that competes against other priorities okay yeah this makes sense and with that in place then those conversations about prioritization of it versus other things became a lot easier I

17:48

think it became a lot easier for anybody just say no this this is worthwhile okay I get it that's yeah that's powerful it's like you're like oh I just told them something that was obviously true and then people had to do it yeah yeah that's right so TypeScript takes off and Chris becomes the

18:07

go-to guy for tough typing problems need help with a tricky bit of TypeScripting call Chris but with this new free time with this project underway the next big issue lands on his desk and this is where things start to spiral a little bit LinkedIn was the biggest user of EmberJS in the world

18:28

but they weren't super thrilled about it and it was a conflict Chris felt it too he was an open source Ember team member contributor but it linked in as the DX guy he saw a lot of mismatches the long story short we ended up in a spot where my job a year and a half ago was

18:45

figure out a plan to get us off of Ember and on to react and at the same time we had senior leaders saying the cost of migrations at LinkedIn is too high we even with all these things you've been trying to do all the things we talked about about getting cost of migration lower making it things

19:01

that infrastructure platform kinds of teams do as much of themselves as possible it's still too high we feel like it is too much of a cost for our product velocity now I would argue that some of the reasons for decreased product velocity are you have a three million line of code app that's seven

19:19

years old that has a fair bit of debt piled into both technical and product debt that's just going to slow down over time it's very hard to keep up that kind of velocity permanently and you can't just change things because you're going to break user things so even the product design

19:36

work is harder because how and where does it fit in the product is harder at that scale but the message we were getting from leadership in any case was we want to see the cost of migrations be lower so how do you migrate three million lines of Ember code to react code in a

19:52

way that's low cost think about this this is a hard problem do a big migration but don't slow anybody down while you're doing it we came up with a strategy that we thought answered the thing they were saying and it was a roughly we expected it to take three to five years and three years

20:09

was very optimistic our thought was just double down hard on the automation side of it make this a thing that product teams basically never have to do a lot of work on and at the the idea was sequence it and chunk it up in ways pull apart the threads of it so that you could tackle one chunk

20:26

of it do it end to end and then move on to the next one and some of them could be parallelized some of them not so much so figure out the sequencing figure out the chunking parallelize what you can change the build pipeline change the data layer change the routing layer figure out a bridge

20:44

for the reactivity system that understands how to handle the routing and the view layer and then finally kind of flip the view layer and the reactivity system over from the amber rendering and reactivity system the react one at the very end but in a way where you've written automation

21:00

that can do it so that was our pitch it was like I said three to five years going to be a lot of work but the idea was product teams will never really have to stop that's a huge undertaking like rebuilding a car piece by piece while driving it continuously on a long road trip

21:18

that would be impressive but three to five years that's a really long time meanwhile while Chris's team was prepping their plan for the engineering bosses another team reached out this team we're going to call the finger guns team not their real name by the way but they had a

21:35

plan too and they were tackling a similar problem including what they framed differently and in my view correctly as the thing that engineering leadership executive and otherwise really wanted which was that word of velocity how do we ship features and ideas faster how do we iterate faster

21:56

how do we take an idea from idea to a b testing can we get that down to be a matter of weeks rather than months because you know it has been months and they identified okay we have this split in our stacks we have this big desktop stack we have a different mobile web stack we have these long

22:20

cycle times on our iOS and our Android apps can we get those times down and I think in retrospect they correctly identified that when our leadership was telling us migration fatigue migration fatigue was a symptom and the actual problem they were really caring about was that it was can we focus

22:42

on faster iteration trying ideas killing ideas if they're not good ideas or they're not working faster and migrations get in the way of doing that because our engineers are spending 10 or 20 or 25% of their time on migrations and not on being able to iterate on these things and the plan

23:03

we proposed didn't address that at all and this other team came in with a pitch that was kind of a blow up the world pitch it was what if we rethink this all from the ground up it was completely what another colleague of mine wants described as being in finger guns mode meaning like yeah yeah

23:21

this is going to be awesome man kind of finger gunning at each other without answering any of the kinds of questions about what does it look to like to operate this when we're trying to support hundreds and hundreds of engineers and I think it wouldn't have made me so mad if they'd come in

23:36

with an attitude of here's an idea but we recognize that there are things we're going to miss because we're used to supporting a dozen not 180 and being 30 times as many engineers just going to reshape things sometimes that fresh perspective lets you come in and say yeah I hear you and we do

23:54

need to make sure we solve that problem what if we try this and maybe it won't work but what if we try it there's a way to do that the team that came in did not do that they were like nah it just won't be a problem man well no it will maybe we can solve the problem and I'm up I'm totally up for

24:10

you telling me let's try this way of solving the problem instead but no no it will actually be a problem we have a lot of experience to tell you from from being in the trenches with it this is a real problem and so it was very much a case of trying to find ways to be collaborative while

24:30

actually just being perpetually pretty frustrated that our message didn't seem to be getting through to this other team they like know there are concerns you really do actually need to care about here and we're happy to help you care about them but we're just fundamentally not sold on the

24:45

thing you're trying to sell because you're not showing us from our perspective a seriousness about the the problem space and the actual difficulty of what you're trying to do here we think maybe it's worth doing but what about X and Y and Z and just getting fundamentally can't like blow it off

25:01

kind of responses it makes me think of like two investment advisors and the one is like S&P 500 index fund and the other ones like have you seen Bitcoin NFTs and like they're just like really pumped right and the enthusiasm is exciting I don't that maybe that's maybe that's too extreme

25:20

of a dichotomy but I mean I for whatever it's worth that is a perfect analogy to how we felt so Chris is mad he's mad about this plan he didn't know that he could pitch a big idea like stop everything and and start fresh on a huge project with the promise of speed later on if he knew that

25:37

you could bet he'd have a plan right maybe a less wild plan maybe with more considerations but it would certainly have a more aggressive timeline than the three to five years but he didn't even know stopping the world was an option and there's there's also sort of a cultural clash here too

25:55

Jim a top very senior engineer on the finger guns team and Chris they just don't see eye to eye on a lot of things they have fundamentally different views on what software engineering looks like and so Chris's team presented their you know well thought out very considered I'll be at kind of dull

26:13

and long term plan to engineering leadership leadership hated it they probably have a hard time understanding of five year plan like yeah yeah I think that was a huge part of it and we weren't excited about the plan which made it really hard to sell and it's really hard to get a team of

26:32

execs and leaders to be really excited about a team that the engineers are pitching as a well it it'll get the job done kind of like it solves other things we think yes just to solve we think it kind of sucks but it sucks less than any of the other options that we can find so presumably this

26:52

is all sort of bubbling around at the exact level people tossing around solutions to a fuzzy problem that started as how do we switch from member to react but but really the core thing was maybe about how do we change things so that we can pick up more speed how do we get faster at

27:08

making changes at LinkedIn and really I imagine the execs have their own worries right they have numbers to hit they have market pressures they probably have pressures from Microsoft a plan to ditch all the cruff that had built up over time and plan to gain velocity to gain speed it must

27:24

sound pretty good and then Chris took some time off for Christmas and I came back and discovered that we'd been having site up problems where chunks of LinkedIn's user base would for up to about 20 minutes end up with a hey sorry something's wrong and not see the LinkedIn dot com page at all and

27:50

well I'll tell you two things one I hate on call and ops kind of work and two I was the most senior engineer on the team and got tapped to do this and it was a hundred percent on call and ops kind of work as we tried to get to the bottom of what those problems were so I basically I went on

28:10

Christmas break frustrated and came back like okay I've taken a deep breath I can do this we'll figure out these dynamics with the other team and instead what happened was things are on fire and also this other thing is still going on in the background and now you get to spend three

28:29

months doing the kind of work you hate most trying to figure out what's going wrong there so about the incident LinkedIn had these sort of pre rendering services they ran the client code with no JS and sort of aggregated all the data from the back end and then they could send it to the

28:46

client in one big request bundling everything up for quick delivery right from the data center to the user but this had memory leaks and the services were going down so we hit the point where those boxes started running out of memory and we had a system designed to say ah you've tripped over a

29:04

limit memory usage wise we're just going to restart the box reasonable we had a couple things missing though one there was no alerting for that so no one was particularly getting alerted like hey you're getting a lot of memory kills I shouldn't say there was no alerting there was

29:19

insufficient to alerting on that second there was a setting to say how many of these containers can be restarted at the same time that setting was a key in a YAML file for configuration and that key was typed but at some point someone set a value there that was a legitimate possible

29:40

value but was the maybe the wrong value to have for this system and by maybe I mean definitely specifically the number was approximately the entire number of these services that was running and they all you know start up at about the same time they all get about the same amount of requests

29:56

so their memory all creeps up bad about the same rate which sounds bad and somebody might take notice of that memory creeping up except yeah insufficient alerting so this is growing and it had gotten bad enough that what we started seeing is anytime there was a long weekend or some

30:14

reason why there was another kind of deployment pause for long enough I mean this misconfigured toggle meant whoops we can restart them all at the same time you know what happens when you restart all of your servers at the same time suddenly they're not responding to user requests

30:29

and we just end up with this kind of thundering herd problem where you'd take a bunch of offline and that would increase the pressure on the rest of them so they would start increasing their memory usage faster because they're getting more traffic so

30:40

washer and repeat and we would end up taking down an entire data center worth of these servers in the background there had also been a right sizing process where the idea was we're going to drop the amount of CPU and memory usage across our fleet to try to avoid over provisioning where we think we've got sufficiently large headroom that everything will be fine and we can save money on

31:04

not buying new hardware if we don't actually need it. The way to think about the intersection of these things was that right sizing process brought down the ceiling and meanwhile the memory leaks were raising the water level in the room and all of a sudden now the water's at the ceiling and the combination of that with one bad configuration meant we were just host and so a bunch of other engineers and I were looking around saying we need better alerting better observability on this we

31:30

need more resiliency there's no reason that a node server running away should kill the host process that's managing these node processes instead we should kill the node process alert on that for that condition and restart the node process because then we don't need to ever bring the container

31:48

down we never get into this situation it's like there's not there's five opportunities here where we can improve the exactly exactly failure is inevitable right humans are going to make mistakes systems are going to experience power outages stuff is going to go wrong in the world and so the way

32:06

I think about software engineering versus programming or hacking or perhaps is a super set of those is about designing systems that support engineers in doing their job of getting to those product outcomes do we have resiliency of multiple layers of places we can catch that something is wrong fix that the the wrongness in a safe way here and alert and say hey by the way something changed and it's going wrong now so when it happens how do we make the system both the technical side of it and the

32:40

people side of it able to respond well if you have the inclination there's a lot you can learn from an incident like this solve the problem yes but also improve the system prevent the problem from ever happening one fail save Chris explore it is just falling back to client side fetching

32:59

if these services were down Chris's approach says a lot about him as an engineer he's all about you know not just the software but the systems around it and how can we improve these things right even as a podcast called winning slowly that's all about steady gains and continuous improvement

33:16

but yeah the higher ups just wanted this incident closed maybe this was both the speed thing again but whatever the cause the incident meetings started to get a bit tense so we had a multiple times a week stand up for it let's status what have we made progress on let's report outward

33:33

send in photo execs etc about what progress we've made etc ask for more help if we need it and one of those after a weekend where we had run along experiment and found a new problem this was a bad time to find a new problem because a manager for the finger guns team let's just call Dave had just taken over the incident response he came in and pulled in a bunch of other people and those people were fine but it was a case very clearly of I don't trust you to solve this problem

34:04

and I don't trust any of your answers and so I'm going to pull in these other people to super-seed you which is frustrating in its own right that's how Jim lands in the middle of the incident call meetings and things only go downhill from there why doesn't code review just solve this

34:20

that was that was a direct quote from Jim and my answer was well it didn't and it won't again in the future because people are going to make mistakes and just be better again it's not an answer because what happens when it's some junior on the rotation who thinks that that seems reasonable

34:39

and this PR was made by a very senior SRE why am I going to question whether that value is reasonable like yeah there's a bunch of boxes that seems fine that those kinds of things happen and to that question of engineering systems one of the things I think about a lot is does our

34:57

system only work or does this process only work or does this tool only succeed if I'm acting like a senior engineer on his or her very best day or does it work if I'm a super-junior engineer who's having a bad day and we really want our systems to be workable for the latter case and

35:17

that helps all of us because sometimes even though I'm a very senior engineer sometimes I have bad days sometimes my brain doesn't feel like it's working at all does the system still support me on those days or does it punish me I really would like to not be punished the incident was moving

35:33

forward but the pressure was mounting and Chris felt like this group just didn't understand the scope of the problem or appreciate his considered approach and why isn't this just getting solved and execs are unhappy with the rate of progress and it's like because you've got seven years of

35:48

piled up technical debt and lack of resiliency that we're trying to fix and it if it takes us three months to fix seven years of negligence I think that's actually not too bad was kind of my take I remember telling my old manager that that was easily the modest I have ever been in any job

36:08

ever was when the finger gun manager came in and was telling me I was not doing a good enough job because it was taking a while to fix these deep seated problems and it it really exacerbated my feelings around their their other proposal of just kind of a lack of seriousness about engineering

36:26

on this scale of no these these things aren't easy to solve we have an unbounded number of memory leaks to fix and because of the way this particular memory leak was shaped they were all references to the same core object so you could fix one of them and it didn't change the behavior

36:42

of the system I was very frustrated and in the background this also kicked off the experiment for the alternative plan which everybody body and on it was like yes this is going to work and a bunch of engineers were like but what about no thou shalt not ask but what about

36:59

around the same time his group took over my group in a reorg hostile takeover it was it was a thing oh he's your manager now yeah he was my boss's boss kind of situation so not my direct manager which is good because I probably honestly would have quit on the spot

37:18

we were in a like let's see you know sometimes people get in a different role and they end up learning a bunch of new things maybe this would all work out for the best though right Chris is a reflective thoughtful guy he could think to himself maybe I can learn from this new skip

37:33

manager's business focus he knew this was a needs improvement area for him the things I was thinking about were real problems and they really did make things better in ways that I think some people at LinkedIn it really do appreciate and I think really do make a difference for

37:50

people using LinkedIn but they were never the thing that the business leadership was most concerned about at any given moment and that decoupling I think is a big part of why we ended up in very different spots and what made it hard to communicate the value of those

38:08

things and as Chris entered this more reflective state reflecting on his frustrations he noticed he had some weaknesses in building relationships so many of my colleagues have personal relationships with folks that maybe I struggled even to connect with or figure out how to work with

38:27

because they'd see them in the cafeteria specifically Chris is thinking of Jim other folks I talked to had less of that challenge some of it was just you know the way that particular engineer related to our corner of engineering man there were some challenges but there were people who had very good

38:49

working relationships with him because they'd just end up sitting down with him in the cafeteria because it'd be a big group of people there and it does really help when you're in the midst of some kind of heated technical argument or whatever to have had lunch with somebody or otherwise to

39:05

have built that relationship but when you have a really strong in-person culture and a very high high percentage of your people are all in person that shows up in those kinds of things because as a company culture as an engineering culture your norms are all around yeah we'll see each other in

39:23

the cafes and we'll bump into each other in the halls and you know people would tell stories about ending up in the bathroom next to the CEO right and that you know it's a funny dynamic and a funny place to you know look over and you're washing your hands and there's the CEO washing his

39:38

hands I could see the difference it made to have had constant just physical interaction with someone who ended up being an SVP of engineering or a principal engineer or whatever that you just know those people this raises another point we've been focused on this technical clash speed versus

39:58

sustainability but maybe it's more of a social thing people under pressure just not hearing each other I mean we only have Chris's side of the story maybe this is all a social issue charity majors had a great quote in our recent article where she said at its visionally

40:16

senior level engineering or manager there are no purely social or purely technical problems they're all socio technical and you have to be able to identify where's the blocker on this is that the social side or the technical side or usually some mix of the two and what's the proportion or what do I need to do to solve this yeah so maybe it's both anyways stepping back from the incident Chris finds out that the finger guns plan is gaining ground its gaining traction and it's grown in

40:45

scope now it's about rewriting both the mobile and the desktop apps we're going to rethink everything about how we build all of our products at LinkedIn which that's exciting and there were some really interesting parts to that and also one way of saying what happened was you could just say I

41:04

and my manager and my team lost and that would not be a false way of describing it okay my pitch lost whatever I just like it that much anyway can we make them can we make your version good oh you don't even want to hear questions that read to you as anything but enthusiasm for this thing's

41:23

going to be great and I'm like I want to make it great but we have to tackle these things to make oh you don't even want to hear that okay well I can keep budding my head against this wall I had conversations with that manager where I was told in literally so many words you're too idealistic

41:39

you don't care enough about the bottom line you should change your values I was like nope that's not how it's going to work man outside I did the politic I understand that perspective I see where you're coming from and inside the little flip switch in my head was flipping saying nope nope nope

42:00

warning alarms claxons going off all around me Chris thought that their need for speed was blinding them to the real problem a lot of the problems we had in the code bases that we had were the direct result of overvaluing velocity and refusing to stop and say this thing over here

42:20

this secondary path doesn't work right let's fix it or let's get rid of it either those are good options but refusing to do that and constantly just know we've got to ship the next feature and how fast can we get it out and we got to ship the next feature and how fast can we get it out

42:35

when velocity becomes the primary or driving value that everything else is subservient to it leaves you in a spot where maybe you have good velocity initially but you can't sustain it over time it's kind of the classic pattern actually for code bases as they age is if you're not continually

42:55

investing in them but you're continually extending them you end up exactly where we were and the things that I saw being pitched were all about maximizing velocity and made no not even a gesture at how are you going to handle these these other things and I I've kept

43:16

mulling on it and I I did my best to give it a fair shot of there's some interesting ideas here there's some interesting technical directions here but I was also remembering why I left my previous job which ultimately was a very bad case of burnout of the I'm having horrific migraines

43:34

and the worst stomach pains I've ever had in my life and unable to exercise random outbursts of solving at random times panic attacks was a bad time I don't recommend burnout it's not fun and if I stay on this road I know where that goes and it's a bad spot and I don't want to end up

43:56

there again and I can either stay in a world where I'm constantly working not to be angry because I don't you know I did my best not to stay angry in those times but when the things that you're running up against every day make make it active work not to be angry makes your job not fun to

44:16

say the least and I can either you know try to turn this Titanic with my little robot over here and be mad that I'm failing so that's not gonna work like shove it with your paddle doesn't doesn't turn the Titanic or I can just say I'm just gonna paddle that way the Titanic thing

44:31

implies that I think they're heading for an iceberg and I mostly just mean the scale of the thing not the iceberg side of it I hope they don't iceberg that would be sad uh and you know a Disney cruise liner maybe I'm not gonna turn a Disney cruise liner with a a robot in a paddle so I

44:46

didn't I said I've learned a ton I have some idea what it's like to work on a three million lines of code app now that migrated ton of TypeScript at a huge company and think about those big problems and now I can go to something else and not burnout and not be angry at my job every day

45:05

or not be fighting not to be angry at my job every day I'm not gonna spend years of my life trying to build in a way and on things that I ultimately don't believe in life's too short that would be a waste okay the sucks but I guess it's time to walk away not not out of spite or malice or any of that

45:27

but you're saying we're going different directions maybe the values that this particular corner of LinkedIn was embracing it this particular find are not morally objectionable but they're not mine that was the show thank you chris for being so candid I feel like stories like this are

45:52

are unfolding at a hundred different places all the time these socio technical issues people butting heads about the right way to solve things but they never get heard outside of small circles so thank you chris for sharing

46:13

chris is at the time of this writing looking for work you can find more at chriscricho.com I'll put a link in the show notes it shouldn't surprise you to hear that chris is looking for a role that aligns with his personal values that's a big takeaway from our chat you you need to agree

46:30

with your team and your organization about what's important and sometimes those things change and you just need to part ways but specifically chris is looking for a role where he can help drive engineering excellence he's pretty passionate about finding ways to make building software better

46:49

he has a whole blog post about this and what he calls ratchets and you should check it out on his website and if you like the podcast and you made it this far so unless you're like hate listening or something I assume you do if you want to support the podcast and keep it going the the best

47:07

thing you can do to support me is go to co-workersof.com slash supporters you know I have no sponsors of the podcast besides individuals link in the show notes you can join as a podcast supporter and you'll receive access to bonus episodes and join the community and until next time thank you so much for listening

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

Story: Leaving LinkedIn - Choosing Engineering Excellence Over Expediency

Episode description

Transcript

Story: Leaving LinkedIn - Choosing Engineering Excellence Over Expediency

Episode description

Transcript ✨

Transcript