The Gene Simmons of Data Protection – KISS (Keep It Simple, Stupid): A Data Security Dilemma | Code Story: Insights from Startup Tech Leaders podcast

⁠¶ Intro / Opening

00:00

Hello, listeners. Today we are kicking off a new series on the podcast entitled The Gene Simmons of Data Protection, The Kiss Method, brought to you by none other than Protegrity. Protegrity is AI-powered data security for data consumption, offering fine-grained data protection solutions. so you can enable your data security, compliance, sharing and analytics.

00:24

In this episode, we are talking with James Rice, VP at Protegrity. He's going to help us strip away the nonsense when it comes to securing data and help us understand why we don't need a fortress just to kill switch. While companies throw billions at firewalls, AI-driven threat detection and fortress-like defenses, attackers still find their way in.

00:44

James reminds us to keep it simple with Protagrity's KISS method, which stands for keep it simple, stupid, and how when data is useless to attackers, breaches become a mere inconvenience instead of existential threats. Well, James, thanks for being on the show today. Thank you for being on Code Story. Yeah, thank you for having me.

⁠¶ Introducing James Rice and Data Protection

01:04

Before we jump into the KISS method and Protegrity and cybersecurity and all the things that we're going to dive into today, tell me and my audience a little bit about you. So my name is James Rice. I'm responsible for really the data protection and data privacy architecture here at Protegrity. I've been doing cybersecurity really most of my life. I started off in consulting when a consulting company was recruiting and said, hey, come join our security practice.

01:31

To a 20-something, that was just James Bond enough to be like, yeah, cool, let's go do this. And spoiler alert, it's not quite so much James Bond, but I got lumped into the security bucket very early in my career and have really spent my entire career there. And so my focus is on how to help companies take advantage of security, data protection, data privacy. to really help get the most out of their most valuable asset, which is their data. I know it's an overused analogy.

01:59

data is the new oil. But really, our goal here at Protegrity is to help customers accelerate key data initiatives by overcoming challenges to using sensitive information. Tell me a little bit about you outside of professionally. What do you do for fun?

02:14

Yeah, absolutely. So I live in Denver, Colorado. I am an avid outdoor and skier, so I'm a little disappointed that we're getting pretty close to the end of ski season here. I may be able to sneak one more in, but typically, yeah, you'll find me doing something outdoors. And if it's winter, definitely trying to get to the ski slopes.

⁠¶ The KISS Method Explained

02:32

That's super cool. Okay, well, let's dive into the meat of it then. So, today we're talking about Protagrity's KISS method, right? What exactly is the KISS method? And, you know, what does it stand for? And how does it apply to cybersecurity? Too often I hear that security, compliance, or audit concerns are slowing down or really completely stopping an organization's ability to use sensitive data. And that's because they have two opposing forces working on their data at the same time.

03:01

They have the usability of data and the security of data. So that means you have data and business teams that are focused on value and consumption of data. They have new options. new cloud capabilities and AI that's driving this really increasing business value.

03:18

But if you juxtapose that with security and IT teams that have more of a risk and compliance lens where they have regulatory or security requirements that are putting tighter controls on data, but it's putting tighter controls on data for good reason. Really, whether you're in IT, security, data, or business, it doesn't matter. Unblocking sensitive data consumption is one of the most important things any company can accomplish in the next 6 to 12 months.

03:44

The sooner the better because our business often wants to move and we'll just accept more risk than we want them to. So the KISS method is really a simplification discussion. Data security is hard and lots of vendors talk about data security. Heck, you'll hear a data warehouse and repository vendors talking about what they do. You hear firewall vendors talking about what they do. The problem is you're having to balance not just security risk,

04:12

but business consumption needs. And so that makes data security even harder. Because data needs to be moved and consumed and analyzed and shared and trusted and compliant, local, global, internal, external. I could probably go on and on. But that's challenging, too, because you have opposing forces there. Wait a minute. I need local data that's also global. That's a really hard problem to solve. So to simplify the cybersecurity approach to data protection.

04:40

There's this concept of really embedding into the data itself. So data protection should mean protecting the data, not everything around the data. And if you think about some of the capabilities around things like. de-identification. What that does is it removes the sensitivity of data and replaces that with a token.

05:00

What that means is then protection flows with and is attached to the data itself, because in the end, data needs to be set free so that we can harness its power for our organization. It can be everything and everywhere I just mentioned. And when I say free, free to move across sovereign borders, free from regulatory compliance blockers, free to move to the cloud or to adopt new analytics, AI, or machine learning models. Free where if data is at rest, in transit, or in use.

05:31

Because the data itself should already be protected. So really what we're talking about here is taking all of these layers and layers of security, which don't get me wrong, super important. Security is absolutely a defense in-depth strategy. However, the simplification is to say, at the end of the day, it's all about the data. And if we can attach security to the data itself, it will allow that data to flow more freely across an organization.

⁠¶ Why Traditional Cybersecurity Approaches Fail

05:56

Interesting. So that makes total sense to me. You know, you're making the data protected, not essentially building walls around it. But why are traditional cybersecurity approaches? failing to stop breaches. And I can kind of come to some conclusions about what you're saying, but I'm curious what you have to say there. I'm not normally a stat guy, but I'm going to use a couple of stats because it was said that in 2023, globally, we spent about $180 billion on cybersecurity.

06:24

And that increases about 14% year over year. I think the latest numbers were coming out for 2024, and it was around $200 billion. So about the same, anywhere from a 10% to 15% increase. However, compromises are increasing at a staggering 70 to 80 percent in that same time period. So, yes, clearly something isn't working. And spoiler alert for some of you, if you're in financial services or you're in health care.

06:49

Your breaches were some of the worst. If you're moving to the cloud, that's where a lot of these compromises are happening as well. So with the status quo, why does everyone else get this so wrong? Why has billions of dollars in annual spend? not solve the problem of usability versus security.

07:06

I think it's easiest to really just look at what that spend is actually protecting. We spend a lot of money protecting the applications, worrying about access rights, securing the infrastructure, hardening the network, trying to lock down the cloud. But what are those things not protecting?

07:22

They're not protecting the data itself. We spent $200 billion surrounding our data with security. I like to call it envelope security. It's just if you put a letter in the mail, you lick the envelope, you put the stamp on it, and you send it off. But if somebody in between were to grab that letter and rip that envelope off,

07:42

they've got your letter in the clear. That's really what I'm talking about. Because again, don't get me wrong, defense in-depth strategy, all of these things are important. I'm not advocating anybody mail letters to their family without an envelope around it. That is important. but we're stopping just short of the goal line. It's the data that we think is the most valuable commodity, so it's time to stop just surrounding the data.

08:04

and instead embedding protection into the data itself. That's the real key differentiation. It's the difference between trying to build walls and keep the bad guys out versus being able to actually embed protection into that thing that they're looking for the most.

⁠¶ Debunking Myths About Ineffective Security

08:18

So obviously, in what you're saying, there's a lot of folks that are approaching cybersecurity in a way that could be considered sort of wasteful. What are some of the biggest myths around security that lead businesses to waste money on ineffective defenses? I would say the biggest myth that I can think of is that traditional security of the past is going to address the data and AI problems of the future.

08:44

That's really the biggest disconnect here because, again, we go back to what we were just talking about. Most of the traditional security is looking at making sure your infrastructure is locked down, making sure you have access rights handled appropriately. But the capabilities of the future, the analytics, the AI, the Gen AIs, the LLMs of the world, they want your data and they want to use that information. And as it's flowing through all of these systems.

09:09

it actually really becomes much harder to rely on traditional security. Because think about, let's take Gen.ai just as one example. You have a much broader set of potential users. So you have both internal and external users who need the data. and are querying it in a way they never have before.

09:26

This isn't like a Power BI report on your screen where all I can do is filter some sets back and forth, but the data is pretty much the same, just determining what filter I'm using to populate what I'm going to see on the screen. Now you have people who are just able to natural language, say what they're looking for, and let the machines go figure all the rest out. And what the machines are then doing is also using unstructured data a lot more as well.

09:53

So you have this vast trove of unstructured information that's not typically as supported by some of those legacy models or traditional security models. And then finally, with Gen.AI, it's a much more dynamic nature of the way data flows. It's not just about running one query. It's about being able to actually look at training models. It's about being able to look at augment them during runtime with additional data set.

10:18

And being able to analyze all of that in line is a very difficult thing to do. Again, I know it's an overused analogy of it's not a matter of if something's going to happen, but when something's going to happen. So when you look at things like Gen.ai and data sharing and analytics. Even some of the standard things we want to do, like offshoring or outsourcing our data, all of those things really bump up against those traditional security models.

10:46

that don't support or handle the data in a way that also makes it available for the business to use. It's more traditionally locking the information down and not giving someone access. Where with a data-centric protection discussion, you're actually trying to give more people access to data. in real time. Okay, that makes sense. But how does this all work? How does it fit together? How do...

⁠¶ Encryption, Tokenization, and De-identification

11:11

lay it out what you're telling me. Encryption, tokenization, and de-identification work together to make this data that's stolen by a hacker or a bad actor useless. It's about really taking a fit for purpose data protection approach. And what I mean by fit for purpose is you mentioned a couple. I'll throw out a few more. anonymization, masking. There's a lot of different techniques that can be used, but at the core, it's a de-identification conversation.

11:43

So what we're really talking about is if I can change the raw state of data. to match the business need of that data. So let me give you an example. Typically, I would ask someone, what's your most sensitive piece of information? And I'll oftentimes hear things like, social security number, very high risk. But follow that question up with the very next one of how many people in an organization actually need access?

12:07

to see my social security number in the clear, and you'll often get an answer of very few. And why very few? It's because most of the systems and most of the people don't care if my social security number is 47631980, which, by the way, also made up. or 1, 2, 3, 4, 5, 6, 7, 8, 9. It doesn't matter as long as it's a nine-digit number that looks and feels like a social security number and always uniquely identifies the same person. which by definition is the purpose of an SSN.

12:40

So what we're really talking about here is being able to apply different de-identification techniques. That was a tokenization example. With anonymization, you might take a date of birth because a machine learning model doesn't really care what my month, day, year of birth is. Unless it's trying to do something maybe based off my astrological sign, but most likely...

13:03

They want to generally know how old I am. And so I can protect someone's date of birth using anonymization to turn that into an age range that says this person is somewhere between 40 to 45 years old or whatever that may be. So it's taking really the business need for data and finding the right de-identification technique.

13:26

whether that's replacing it with a token cipher that can be used to uniquely identify someone, or generalizing data to make sure that you're not exposing some of the most sensitive information of an individual. Both of these are de-identification capabilities. They're just different techniques, and it really comes down to two sides of the coin. You're either redacting data or you're pseudonymizing data. Now, pseudonymization is a hard word, but it's an important word because...

13:56

redaction is actually about throwing data away. In that anonymization discussion, I just threw away the month, day, and year of this person's date of birth and gave you an age range instead. I'll never get back that information. Where with pseudonymization, you're able to actually replace that information with a token that still uniquely identifies someone, but you can still get back to the original data if you needed to.

14:21

So let's take an example with the social security number of a customer service rep. I've called into my bank and they're trying to validate I am who I say I am. That customer service rep may need the last four digits of my social security number. So if that social security number was tokenized, I would have to re-identify it and mask some of the data. So this is where you start to see all of these different techniques coming together.

14:44

I can tokenize it at rest to protect it. I can mask it in use so that the right user or the right machine has access to the right data at the right time.

⁠¶ Real-World Examples and Simpler Approaches

14:54

Okay, let's put this in some real world examples, right? Can you share an example where a company's focus on complex security where it backfired, right? And how a simpler approach would have been better or could have helped? Yeah, a couple of things, actually, I think here, I would actually use maybe the snowflake breach as an example. And I'm not trying to pick on snowflake. I'm actually saying.

15:16

If you think about all the great things that Snowflake has in place, like access controls and masking and multi-factor authentication. Yet, they still had a breach where bad actors were able to gain access to the data. That's because those bad actors penetrated all of those traditional security methods, and then the data was just sitting in the database in the clear.

15:37

So if you juxtapose that now to the conversation we were just having of taking all of the most sensitive data and replacing it with tokens. de-identifying that information, even when the bad actor was able to get into the system, all they would have seen were the tokens, because it's very unlikely you would be giving your DBA full access to all credit cards and social security number level information.

16:02

So it's really about, again, being able to change the paradigm from surrounding data with security to applying it into the data itself. Where I would go with this is to give a couple of examples of how a simpler approach does help. We were working with one of the largest credit reporting agencies in the world, and they had a cardholder data environment for PCI DSS. And what they were able to do by de-identifying those credit cards... is actually de-scope a lot of those systems for PCI DSS.

16:36

Think about what that means. The auditor actually said de-identified data. is not going to be held to the same standards as a regular credit card information. And so that company was saving $40, $50, $60 million on audit, compliance, and other security that they no longer had to apply to their cardholder data environment. Flip this over to more of a business example. We were working with a large broker dealer who was wanting to feed a machine learning prediction engine.

17:04

But guess what? Their security audit and compliance team raised their hand and said, no, you can't take that data to the cloud. You can't put that into these new systems because it's too sensitive. So by de-identifying that data, we're actually able to get the checkbox from security, audit, and compliance to say, okay, you're all good. Feel free and move that data forward. But again, another good example, there was some tokenization to move data to the cloud so that it was protected.

17:31

And then there was anonymization as part of the machine learning prediction engine. So the prediction engine could get the most out of the data as opposed to just being able to operate on de-identified information.

⁠¶ Addressing Pushback on Data-First Security

17:44

Okay, got it. So what's the biggest pushback you hear from companies? Because all this just makes too much sense, right? Why are they hesitant to adopt a simpler data-first security model? What are they saying to you? Really, it starts with a complexity conversation. And you're right. This is a simpler approach. But if you think about it from a user's perspective, who's looking to adopt this.

18:06

They're saying, wait a minute, I've got credit cards flowing across hundreds or thousands of systems in my environment. You're saying I have to do something to all of these systems to be able to apply protection. I would actually say no, that's not the case. If you think about a majority of those systems, they can likely operate without the actual credit card number. Think about an analytics platform. Your analytics platform probably could care less what my credit card number is.

18:33

As long as it's the same number across all of my transactions so it can correlate my data together, whether it's my real number or a made-up number is irrelevant. And so what that means is I can let data flow freely into and out of my analytics platform without having to worry about all the other security layers and things that I would have maybe had to put in place.

18:54

because of the sensitivity of data in those systems. The other really ends up being performance. There almost always is a performance conversation to be had here. And as much as I hate to say it, Unfortunately, the Protegrity team has not overcome the laws of physics just yet. As far as I'm aware, most people haven't. And so, yes, there is a performance tradeoff with security that we're talking about. But what you're doing is wanting to try and make this the most performant option.

19:22

And so what Protegri has really done is said, look, there's multiple integration methods. You can have transparent methods that are able to intercept data on the wire, HTTP, SMTP, JDBC type intercepts at the protocol level. You can have an interface-based approach where you call an API or a function call within an SDK to integrate in, or you can even install an agent on, say, an Oracle database, for example.

19:49

And so the point here is when you really think about how to combat both the complexity and the performance concerns, it's about having a lot of options available for you for how to integrate. and how to apply this security based on what your business needs are. If your business says, I can't touch a database, then the agent probably doesn't make sense.

20:10

If your business says, I can't touch the app or the database, I need to do this on the wire, then that's where the transparent gateway approaches make the most sense. The pushback is not wrong. Data protection is hard and there is a performance tradeoff, but it's really about finding a solution that's optimized around both of those things to make it as easy as possible for an organization to scale with their business initiatives.

⁠¶ Implementing the KISS Method: First Steps

20:33

Okay, so I really appreciate you answering these questions and I'm curious. So I'm a company, right? I want to implement Protagrity's KISS method. I want to do it tomorrow. What are the first three steps I should take? Like, dub it down for me. What do I need to do? You really need to start with architecting and prioritizing around a use case.

20:52

Just because data is everywhere doesn't mean you need to boil the ocean. You can really look at your business and probably find Gen. AI capabilities that you're trying to deploy into production that may be slowed by sensitive data. There's a great example of one where you can show immediate business value. You might have compliance needs around PII, PCI, PHI. that you have to worry about to be able to ensure that you're not going to run afoul of laws, rules, or regulations in certain systems.

21:22

So it's about finding that use case where data protection can have the most business value. That's really where you start. But from there, it splits off into two ways. There's a data discovery discussion and a data protection and privacy discussion. On the discovery side,

21:39

It's really about the ability to be able to locate, tag, and classify sensitive information. Because the approach I would suggest is you really look at segmenting your data controls on multiple dimensions, and you start with risk. All data doesn't have the same risks. Even PII has different levels of risk. You have credit cards and emails and first names. Those are vastly different. And such drastic differences mean you can't necessarily apply the same controls across all the different data.

22:07

But then you need to think about how that data will be consumed. Because just as all data doesn't have the same risk, all data is not used in the same way either. So really, it's thinking about the consumption needs of the business. Can this analytics platform operate on a token? If the answer is yes to that, then you've got a much easier route to securing that information.

22:28

And that's what leads us into the data protection and privacy, where you really then start to determine the fit for purpose protection. You're looking at your data that you've discovered and segmented on risk and consumption so you know why you need to protect it from a risk perspective and you know how the business wants to use it. Now you decide which sort of protection method to apply to it.

22:51

Is masking okay? Do I need to tokenize this completely to de-identify the data at rest? You make those type of decisions and really start applying security at the source when you're onboarding new data. which takes us all the way back to what I said at the very beginning. In doing all of those things, it actually allows you to set your data free.

23:13

Because with data protection embedded into the data itself, you can let that data more freely flow both internally and externally across your organization or to partners or other providers that you're working with as well.

⁠¶ The Value of Simplified Data Security

23:28

you being on the show today this all just makes too much sense the old ways of doing things just aren't working anymore attackers are getting smarter with how they're able to get around the old-fashioned barriers and the way to keep it simple

23:44

is to de-identify the data that is in the system and to make it useless when it's obtained by attackers. So I really appreciate you walking us through this and being on the show today. I really appreciate the time. At the end of the day, if you think about security today is oftentimes...

23:59

a cost center type discussion. And what we've just talked about, even the example that you just gave was really looking at the business value of that as well and turning security into a value enabler because that data is flowing more freely, because I'm able to use it for all the business. There you have it. This was a simple yet insightful conversation.

24:29

You don't have to build a fortress to keep your data secure. As James points out, you just have to keep it simple and make data useless to attackers. Thank you for listening to today's episode. If you'd like to learn more about Protegrity, go to protegrity.com. That's P-R-O-T-E-G-R-I-T-Y dot com. And thanks again for listening.

✨ This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.

The Gene Simmons of Data Protection – KISS (Keep It Simple, Stupid): A Data Security Dilemma

Summary

Episode description

Transcript