Safety vs Security with Thomas Depierre

00:00

today, open source security is talking to Thomas Depierre Thomas, tell us about yourself and why you're here. Hello, so I'm Thomas Depierre Based on my amazing accent, you probably realize I'm French at this point. Or maybe not because my accent even for French is really bad. I'm an open source maintainer. I am an SRE, which is Site Reability Engineers. Basically, I try to keep the thing running even when it shouldn't. And you probably know me.

00:34

because I've been pretty loud on the internet for a few things around open source security and supply chain and all. For most of you listening, if you know about me, or at least some things I've written, it's probably because you have read my I am not supplier blog post. I blog at softwaremaxims.com, which I was amazed to find a couple of years ago, it was cheap and not used. Right, let's take it. So... So that's mostly why you may know me.

01:06

While I have already been, at least I've been on open source security podcast in the past, I think twice as a guest. And I think I'm back here because Josh wanted me to talk about all kinds of things I, know, bother him with on signal regularly. Okay, you have many opinions and much knowledge, but the one thing I want to talk about today is I have learned an enormous amount about safety specifically.

01:37

I think in the world of computer security, we don't talk about or even understand like the traditional safety practices that have existed for hundreds of years in some cases, but you understand this and this is kind of what I want you to discuss today because I think You have solved a huge number of problems we're still talking about, feel like, quite often when you tell me to go read a thing or inform me of why I'm wrong about something I'm talking about.

02:05

Yeah, so many people being wrong with the internet. So I think, so I will not say hundreds years because I think the way in security, infosec especially, right? And cybersecurity, use whatever term makes your world seem better is really, really close to what has been happening in safety for a hundred years.

02:28

What, where safety may be interesting, I think, is that there have been different schools of thought on safety that have evolved for the past 50 or 60 years that are, I agree with you, can bring a lot to us. And where we are seeing more and more software people go into these domains. And even in these domains, it's starting to feel like an invasion from software people where we are searching for help, right? And so... I think the first thing I would say is safety is not security.

03:07

And that's why I tend to try to use the term safety a bit more because I find it a bit larger. But the problem with talking with things like safety is that you will get different definition when you talk with different people. also, and that's one of the things that I think these domains bring too. It's a different, it's the ability to hold multiple way to look at and understand a domain or a field at the same time.

03:39

So for me, safety, the way, the definition I like to use and I use a lot is the absence of loss events, which is really, really nebulous because loss events could be anything that you find really, really bad. Losing money, killing people, that's pretty bad usually. Or, know, harming people. And depending on how you see it, can be important or not. And so a loss may not be one. Like if I lost 10 euros because I lost a pen at work, that's probably not a safety problem.

04:14

If I have the equivalent of a night capital incident, for people that don't know, night capital was an algorithmic trading company in New York, if I'm not mistaken, that had a small problem. at some point and they had a computer run off. I'm not going to go into detail here, except if we want to, but they basically killed the company due to how many loss they had in 45 minutes. Yeah, that's the kind of stuff that I would define as safety. Like it's not only security, right?

04:50

At this point, you're also looking at safety. Or when we talk about security, safety comes up a lot, right? Because people may get harmed, right? When we talk about ransomware, for example, you will see a lot when hospitals get attacked, that you will see an official public information. and information that will tell you that no one was harmed by this, right? We found ways to save everyone, to handle, that's not true.

05:24

I'm pretty sure you cannot have zero impact on people lives that were in that hospital at that moment if the walls of the system got down. Maybe it was limited, but it was not zero. And so that's the kind of moment where we are starting to see where security is not only, hey, is my grandma getting a... Adware, bad malware, adware, bar on their browser and also how does it affect them, right? And also, well, they pay money for it because that's a ransom or thing.

05:52

Now we are maybe talking about safety because now we have people that may not have the money and the time that need to pay this. And so that's a real harm, right? So that's where I see this link. And yeah, to go back to, you know, things that have been solved. it's true that I come from, I am more interested in a particular way to think about safety, particular school of safety, right? Which makes academics and practitioners, right?

06:25

People like us working in real life organization and corporation and company, which is around resilient engineering. around safety too around safety differently, these are all names that different parts of these core suits have used. And this is why there is a good difference with the way we do it right now in security. Right. Right. Okay, so that was the fire hose of safety, I think, right there. Which is amazing.

07:04

So everything you describe, and I think the thing I'll latch onto is you talk about the kind of defining safety as harm, right? And you mention stealing a pen versus killing a human. And I think in the world of security, there's, we're starting to see people discuss risk. as a concept where I think historically security was always, we're going to make everything secure, which is ridiculous, right? That's not reasonable.

07:34

And now we're starting to have these risk discussions and I'm curious in the safety world, has there ever been this idea of we will never hurt anyone in this environment? all the time. I mean, if you have followed a bit the road safety discussion, especially in the US, you may have heard about road to zero, right? Or a goal of zero accident ever. This is a real thing that people believe in and try to achieve, right? That exists. Now, the, yeah, yeah, yeah.

08:14

I have worked in facilities that had the goal of zero accidents. That is an acceptable goal, but no one expects that. Where I think in security, we do expect zero incidents sometimes. I can promise you that the people that fight for vision to zero on road safety are really, really big on the fact that yes, this is achievable. This has been constant problems with other people working in safety. But yeah, no, I get you on the goal.

08:46

And even the goal may be hard because once you enforce it, once you start to look at and to make it a goal, you are starting to shift people mindset and actions around that right but even if we talk about risk I would go like yes they have been fight against you know zero and what risk is but the prime and the the next level of this and something I feel a lot in software is that there are systems where it's really really hard to accept risk.

09:28

Let me give you an example and I will probably come back to this one for other ideas later. The crowd strike incident, Recently crowd strike had little oopsies with configurations that ended up breaking a lot of computers for a few hours to a day, something that, depending on how you count, right? And the problem is this is so widespread in terms of use, right? And in terms of critical operations that depend on it, right?

10:01

Banks, airlines, stuff like that, that I am not sure there is an acceptable risk. And that's where you get into the kind of this problem is the same way with a nuclear power plant, right? There is something deeply grating where you are like, okay, there is a small risk that it explode. Well, that's not really something we accept and we want from this kind of things. Right.

10:29

So, so the old systems in which talking about risk will probably not help you shift the mindset that much because that's, that doesn't fit the risk profile. Even if it's really, really low, we'll just not match what we can accept in terms of impact. So it's that's the first thing. And the second thing is how do you quantify risk? Because it's one of these things that is pretty easier to know after the fact than before. And that creates all kinds of problems, right?

11:05

That not to say that there is no ways to go with that and the topic we can talk about on how we can try to end that, but that's a really, really hard proposition, right? To say like, we can quantify risk and... That means there shouldn't be a discussion about risk like you should do that and that's really useful. But using it as a goal or as a way to measure where we are going is pretty hard, right?

11:33

I mean, I think that's fair, but now I'm curious, do you think this was a problem in the world of safety, like a hundred years ago, where they, even today, okay, that's fair. I mean, I would expect that, cause you're right, it's a sort of thing. Before an event, it's very difficult to describe kind of the outcome that you don't want to happen, right? And even like even a one in a thousand year event can happen tomorrow, right?

12:02

Like it doesn't mean we have to wait a thousand years for it to happen, right? That's the reality of these things. And the other aspect is you can measure risk easily when you are talking about a system, right? I'm going to use the word system a lot, where things are easy to resolve. about, you can think about it, you can say, okay, there is a linear causative disease here, I can decompose it into these five different systems and this has a rules they use between them.

12:35

I can really easily have my boxes and define how my risk flow and how it works, right. But that get really, really hard once you start to introduce all the shifting boxes and complexities that we have in software today. it's, and that's partially because risks get offset. Something we talk a lot in site reliability engineering, SREs, is that you can build a highly resilient system, a system that has really low downtime with really, really bad parts.

13:22

If you have stuff that breaks all the time, you can still combine them in a way that gives you a highly resilient uptime. That's possible. We can show that you can do it. We do that all the time. so we do that with hard drive, right? Where you combine them in certain ways that when you lose one, you have a highly redundant thing and all. So it's make it really hard to look at it this way, right?

13:49

And so, and but... because the other side works too, which is that if you take this part that breaks all the time and combine them in this complex system that have a higher up time, when you have the perfect storm of breakage, you usually get a worse case, right? Because suddenly you have everything break at once and your failure handling stuff doesn't work either and then you get everything breaking down, right?

14:19

Right. And I feel like that is the story of the security events we hear about, right? Because there are no doubt millions of security events happening every day, but only a few times a year do we get that perfect storm of everything failing in just the right way that now, you know, someone has stolen millions of dollars or we've crashed the world. I mean, if I think this is linked to something I talk a lot and I think it's useful to keep in mind when we talk about these things.

14:55

None of this should work. I'm sorry, but when you think about CrowdStrike, think about it. We are doing an amazingly complex set of rules. Because that's basically what this is. It's a lot of rules, database of rules and signatures and things to look at all the time for every system call, everything on your machine. On millions of machines across the world, probably at least hundreds of thousands, I'm pretty sure. that are all slightly different and quirky, right?

15:28

Different version of Windows, different version of Mac, all kinds of stuff, right? And you combine all of this and you have someone pushing an update multiple times a day for this configuration, these rules, which are super finicky, right? We're talking kernel level stuff that are super hard to write when if you make a single mistake, everything blow up. And we know now that this was not... The pipeline was not full of airbags to push these things to production.

16:00

It was basically write it, send it. It was not really helping you make it right. And yet, it basically never failed. The one time you failed, we heard about it, but it was one time over how many years of multiple daily updates. For sure. For sure. That should not work, right? Everything we know about our software works and it'll tell us that it should break all the time, right? right. And yet it doesn't. And why doesn't it? Because I know you have thoughts on that.

16:35

yeah, I have opinions, because people. So that's the school of thought I come from. So historically, security, I think, shares this with the usual safety mindset, which is that people are the problem. Thanks. No computer user will disagree with that. Right? But like, people have a problem, right? They don't respect your process, they make mistakes with the code, they click on the link they shouldn't. People are the problem, right? In security.

17:07

Right. So here's the thing I say to, I'm used to talk mostly to SREs, ops people, like people that try to keep the things running. So the usual thing I tell them is, okay, here's my little, you know, poll. Let's imagine everyone in your company, every single person, put their hand in the air and stop touching the keyboard and mouse. At all. They stop. How long until your system go down? One hour? One day? One week? One month?

17:53

If you're telling me more than a week, you are a liar And so the reality when you think about it this way, and I could ask that for the security people the same way, right? How long would it be before you have security incidents, like big one, if no one would, if everyone would stop touching the keyboard?

18:14

the very least after a month you will lose a set you get somewhere that will bring things down and then you will get a problem in a GPS somewhere right and so what this means is that your stuff work because people work on it all the time. Yeah. Yeah, for sure. So people are the ones that create the safety, not the ones that stop it. People are the solutions, not the problem. It's just that this is normal work.

18:46

This is the work that they do every day and so it becomes invisible because it's just the background. Yeah. Yeah. And so that's what I would say for this, right? People, look at what people do, try to understand why they do it and go from there. I think that's perfect. That is the answer I was looking for here, because I knew we would get there eventually. If people are the reason it works, right? And you need to go look at what they do.

19:17

The other thing I would say is look at what you are at, what they are not doing, even if you would want them to do it and try to understand why, right? What's making the right decision to ignore your stuff? Because that's where you can start to look at, are you actually helping or not? And that's where we'd end because like, I could not click on links in my emails. But then I would probably be fired pretty fast. That's right. That's right. That's fair. I like that advice.

19:55

You know what that feels like to me, Thomas is that famous picture of the airplane from World War II where the bullets had hit and people were saying, we should reinforce those areas. But it's the places that they weren't hit that were important. everything you've kind of, okay, I'm sure it's not right. Don't get all technical on me here. But the point is that what you mentioned, you know, what people aren't doing.

20:20

and especially as security people, we obsess over what people aren't doing sometimes. And some of it might be accurate to obsess over, but some of it almost certainly is not. And yes, but, and we should probably end on this, but what I would say is if you obsess over why people, over people not doing them, then you need to understand what makes them not happen and make it possible for the people to do it. Right. And that may mean things that are not at all what you would consider security.

20:53

Right. And an example, I would think here, and I would not go into it, but like the Rust compiler, right. would be an example I would bring of these kind of things. and we will have you back to talk about Rust in the near future, I'm sure. But okay, Thomas I want to thank you. This has been a treat and I have learned a ton. Thank you. I hope everyone learned and I hope that I will learn also a lot from your other guests on this show because I think that's the goal. Thank you.

21:21

And thank you, everyone.

Transcript source: Provided by creator in RSS feed: download file

Episode description

Transcript