Credential Digger – detecting leaked secrets on GitHub | The Open Source Way podcast

Karsten

00:01

Welcome to

00:02

The Open Source Way. This is our podcast series, SAP Podcast series about the difference that open source can be. And in each episode, we'll talk with experts about open source and why they do it The Open Source Way. I'm your host, Karsten Hohage, and in this episode, I'm going to talk to Slim Trabelsi about Credential Digger. Welcome, Slim. Nice to have you here.

Slim

00:28

Hi. Thank

00:29

you.

Karsten

00:30

All right,

00:31

let's see, who is Slim? Slim is a security research expert, inventing new cyber security tools and solutions to protect SAP and our customers, of course. He joined SAP about 15 years ago as a privacy and data protection expert, so you stayed true to your subject, I guess. And then you pivoted your interest to cyber security and threat intelligence. Currently, Slim's working on several projects on data leak prevention, and he lives and works on the French Riviera. That sounds very nice.

Slim

01:12

Yeah,

01:12

indeed. So, French Riviera: It's a very nice place to live and also to work because they have the possibility to, just after working, leaving the lab, going for diving, doing kayak, running, cycling and many outdoor activities there. We are quite lucky to have an SAP Labs here in the South of France.

Karsten

01:37

That

01:37

almost sounds stressful in the work-life balance if you have the option of kayaking and everything all the time.

Slim

01:45

Yeah,

01:45

exactly. So, it's a good place to work and to live.

Karsten

01:49

All right.

01:50

Let's maybe look at it as a place to work. What kind of SAP organization is that that's located in such a beautiful area?

Slim

02:00

Okay, so

02:01

we have an SAP Labs. So, that is called SAP Labs France. Sap Labs France is divided in three locations in France . One in Mougins, the biggest one, a small one in Paris, and small one in Caen in Normandy. But here, we have mainly support and development teams, and also, we have a big security research team which I belong to . And also, a very new team that is e-mobility team that is managing electric car charging, charging stations and so on

02:36

. And SAP Labs France is the first 100% electric car location at the SAP worldwide.

Karsten

02:45

All right.

02:46

I guess, I think France is ahead of Germany as far as that's concerned anyway.

Slim

02:51

Indeed

02:52

yeah, we reached the highest number of charging stations in Europe very recently.

Karsten

02:58

Nice.

02:58

Okay, so I wouldn't have to worry on my way to Spain or to Southern France not to find a charging station. Good to know. I'll keep that in mind. But let's now maybe look at the credentials issue that we want to talk about today. And I think rumor has it that there was a time, some people say, and they probably exaggerated a little bit, when you knew everyone's password at SAP. How did that happen?

Slim

03:28

Yeah,

03:29

it's a bit exaggerated . But yeah, there was something there, maybe a bit of context: So, a few years ago there was a big data leak that happened to LinkedIn where a lot of passwords and accounts were leaked, including e-mails and passwords. And of course, LinkedIn is a professional social-network and among these e-mails, there are some SAP e-mail addresses. And of course, there are some passwords related to this.

03:59

At that time, we found an archive containing all these passwords while we were looking for something else on the dark web. We were looking for zero days attacks against SAP. We found all these archives and we decided to crawl the dark web in order to get more and more passwords related to this. And we discovered at that time that there was a lot of platforms that were sharing still stolen and leaked passwords.

04:28

So, we decided with the team to focus on this and to really create some tools that are collecting all these leaked passwords, especially related to SAP directly or indirectly. And there was indeed a joke in the lab here. So, if someone forgets his password, he has the possibility to come to me, and then I will remind him of his password.

Karsten

04:52

Okay,

04:53

maybe I'll add a little bit to that. Just to remind everyone what you said in the very beginning: That was a LinkedIn leak.

Slim

05:01

Yeah,

05:01

exactly.

Karsten

05:02

So, that

05:02

was password and user pairs where SAP addresses, e-mail addresses, were used for the log on to LinkedIn. So, this was not credentials for SAP systems or anything, like not internal systems or whatever, but simply an occurrence of SAP somewhere in public web offerings where the password was leaked. Right?

Slim

05:28

Exactly,

05:28

this can happen to any person, any company, anyone who is creating an account on a website that is at some point of time leaked. And now, this happened a few years ago, I think about 5 to 6 years ago, and at that time there were very few tools looking for all these leaked passwords. Now, all the companies, including SAP, they run several monitors and tools in order to prevent and alert in case of leaks like this.

Karsten

05:59

Yep.

05:59

Okay. Before we actually come to Credential Digger, that I guess got triggered then more or less. You mentioned one thing, I don't know, maybe everyone else knows but I'll ask you, what's a zero day attack again?

Slim

06:13

A zero day

06:14

attack or vulnerability. It's when we have a vulnerability that was not yet known by the software developer that someone identifies and decides to exploit instead of alerting in order to fix. So, zero day is when the vulnerability was not known but exploited before any patch or any fix for this.

Karsten

06:40

Okay, so

06:40

that doesn't specify anything technical, but rather says this is the first time the vulnerability comes to light. And unfortunately, someone who is not from the light but from the dark side has discovered it.

Slim

06:55

Yeah,

06:55

exactly.

Karsten

06:56

All right.

06:56

Okay. And so all this then, in the end, led to the Credential Digger project. Right?

Slim

07:04

Exactly,

07:05

yeah. In fact, at that time, when we discovered all these password leak, we decided to create a tool that is monitoring and alerting SAP about this. It was a preventive tool that is crawling a lot of sources starting from the dark web, but not only dark web. We were also looking for hacking forums and so on. And among all these sources there was one specific source that was a bit, I would say, interesting for us, that was GitHub.

07:39

So, GitHub is a platform initially designed for open-source sharing and where there was a lot of source code coming from SAP. That's totally normal because SAP has open source and there are a lot of employees from SAP using this platform. And we observed at that time that there were some passwords that were hardcoded in the source code.

08:05

And since my job was to innovate and create new tools, especially protecting our open-source organization, I decided to work on a tool that is detecting these passwords and then alerting the developer in order to fix the issue before publishing it to the public GitHub, and to help the open-source organization of SAP to push only clean source code. And at that time, there were very few code scanners that were focusing on all these secrets.

08:43

So, when I say secrets it's something like a password that should not be hardcoded in a source code. There are very few of them, and most of these password detectors or scanners were not able to detect non-structured passwords. So, we decided to respond to this, to train a machine learning model. We had a lot of expertise in training machine learning models but we decided to train a machine learning model only dedicated to recognize passwords.

09:18

This was the first time that a tool is using machine learning in order to recognize passwords. And we integrated this to a scanner, and this scanner is called Credential Digger. And at that time, it was the unique dedicated scanner for secrets that was able to detect non-structural tokens like passwords.

Karsten

09:42

Okay.

09:42

So, I take it now, does Credential Digger mostly scan code on GitHub for hard coded passwords or does it also do what we talked about before, with the archives of LinkedIn passwords? Does it also scan like dark web sources or something if there are lists somewhere ? Or what else does it do in detail?

Slim

10:07

Credential

10:08

Digger was designed mainly to scan for code, but we had a lot of requests coming from our colleagues, coming from customers, to extend the scope. So currently Credential Digger is able to scan any data source that contains a password. This means that any file with a text readable content, that could contain a password, can be scanned by Credential Digger. So, we can scan file systems, we can scan wiki pages, we can scan documents, web pages and so on and so forth.

10:48

Even if the main usage of Credential Digger is still to scan git repository and source code. But as I said, we can do many other things related to this.

Karsten

11:03

Okay. And

11:04

I just assume that, I mean, this issue was probably not completely new at the time. Was there like no tools like that before? Or what's the difference about Credential Digger?

Slim

11:17

Yeah. So,

11:18

as I was mentioning, in fact, with Credential Digger we decided to really create a new paradigm in the way how we are scanning. In fact, as I was mentioning the other tools are using regular expressions. A regular expression is very useful in order to identify something that has a specific structure, like a standard, like an AWS Key from Amazon. We know that an AWS Key has a specific string at the beginning, a specific length, and so

11:55

on. But now, it's clear that all the secrets that can be written in a source code like, I don't know, a password to access to a database, MongoDB for example, there is no standard. We cannot detect it using a regular expression. Even worse, if we write a regular expression looking for this non-structured password, it will generate a lot of, what we call, false positive.

12:22

A false positive is an alert that is coming for something that is regular, someone that is writing, typing correctly, a call to a function in order to authenticate using a secure password. So, at that time, when we did the first study we estimated to 80% this rate of false positive. So, for example, if we pick up any other tool that is doing secret detection, we run it on any open-source project, we will get 80% of false positive related to secret detection.

13:01

And when we talk about projects with millions and zillions of lines of code it becomes quickly unmanageable. This is why with our password model, thanks to our password model, we reduced this to only 8% of false positive. So, it's a ratio of ten. And beside all these scans we have also a difference that Credential Digger is able to scan in the history of the project. So sometimes we have open-source projects that are clean, that are not containing any secret, but in the history.

13:42

Because GitHub and all the git platforms are keeping the history, in the history, the beginning of the of the development of this tool, there was some, maybe people, developers, interns who left a password . And Credential Digger is able to find this history, to look for the history, and scan, and alert for this.

Karsten

14:03

So,

14:03

actually the vulnerability has been removed.

Slim

14:06

Exactly.

Karsten

14:07

But, as

14:07

one in general says, the Internet never forgets. It's still there somewhere in the tracks. All right. And I think with the false positives, latest since the Corona tests, everyone out there has been thinking about the relationship between false positives and actual recognition rate of the true positives. So, I think the statistic concept behind there is way better known than it used to be before that.

14:39

Anyway, now everything that you have described, you are running because of course also Credential Digger itself is an open-source project, right? What's the advantage for such a security relevant tool?

Slim

14:55

Yeah

14:56

so, why we went open source. In fact, first of all, we had three main motivations . At the beginning, it was my first attempt to go open source, the very first motivation was resource management. When I say resource I'm talking about our interns, that are contributing to our projects. And then after six months they are leaving SAP for other opportunities, or some of them are staying at SAP but going to other teams. And we thought that it was the best option to keep them in the loop.

15:31

So, many of them were super interested about the project, they wanted to contribute, and it was very hard for them, especially if they leave SAP, to still contribute to the project. So open sourcing it makes it very easy to retain these talents, even if they left SAP - very first motivation. Then the second thing was the complexity.

15:55

Sometimes we are doing innovation, we are creating tools, we are creating new proof of concepts, prototypes, and sometimes it's very hard to run this prototype on the customer side because it's not yet a product. A prototype is far from being a product, especially at the beginning. And making it open source will facilitate a lot of things, especially in the customer adoption, customer visibility also, because we can talk about it because it's open source.

16:27

We can share our test cases with anybody that is interested in this. So, there was a very good motivation to go open source. And by the way, we had a lot of, I would say, feedback from the open-source community. And I would say the obvious reason is that our main target is GitHub. GitHub is the very first platform to host open-source projects. So, for us it was super natural to make it open source, to make it available to this open-source community in order to secure their code.

Karsten

17:04

Okay, so

17:05

now it is itself open source and it mainly deals with the main open-source platform GitHub. Does that mean it's on any of the recommended lists, or like in the open-source world? Or are there any very prominent success stories?

Slim

17:27

Yeah.

17:28

So, when we launched the open-source version of the scanner, first of all, we had, I would say, a success on the Pentester community. For example, Integrity, that is a company doing some pentests, some bug bounty, discover the tool and started advertising about this. For example, we were put as a tool of the week, a few years ago, j ust a few months after the publication. And then we were more and more known from the community, from the security community, the software development community.

18:04

And, so, at some point of time, beginning of this year, Credential Digger was recommended by GitLab Security . But also, the GitHub static analysis tool, that is an organization from GitHub that is listing the, I would say, the most recommended tools per development language. And then very recently, we were referenced by the NIST source code security analysis list as one of the secret scanners, that can be used in order to secure your source code and avoid leaking secrets.

Karsten

18:48

All right.

18:49

Would you happen to know what NIST stands for?

Slim

18:51

So,

18:53

NIST is the national security organization from the U.S. So, it is a governmental organization that is writing standards, cyber security standards, that should be applied by the the U.S. government, the U.S. institutions, the U.S. departments of security.

Karsten

19:13

Nice.

19:14

That is a pretty prominent mention there. Let's maybe come to a critical point here after the successes and the appearances on lists, especially as Credential Digger is run as an open-source tool. How do you ensure it's only used in a defensive way?

Slim

19:34

19:35

first of all, Credential Digger was designed to be used as a preventive tool. This means that we need to know which projects that we need to scan from the beginning. So, the idea is really to be used by developers to protect their source code. Of course, the natural way of executing scans is limiting, for example, an offensive person to run blind scans.

20:10

And of course, we designed it as any traditional, or basic vulnerability code scanner, like other well known scanners looking for vulnerabilities. We are looking for secrets. Now, there are some ways to make it offensive, but we don't cover these ways. And the way how we are making it open source makes the task very hard for an attacker to run it in an offensive mode. But of course, any security tool, any defensive security tool, with some hacks, tricks could be used as an offensive

20:54

tool. But we limited this risk to the maximum.

Karsten

20:57

Yeah, and

20:58

I think in the end it's also the bad guys have such tools and use such tools anyway, right?

Slim

21:05

Exactly.

Karsten

21:06

So,

21:06

then why leave it to the bad guys, and then rather deploy the same mechanisms that they use to spy out passwords to find them before them, and take them out of the hard coded code and everything, right?

Slim

21:19

Yeah,

21:20

exactly.

Karsten

21:22

Okay, I

21:23

see, I get the point. Let's go to the famous last two questions . Then, the famous before last question is: Where would you send people if they want to find out even more about Credential Digger, or even contribute?

Slim

21:37

So,

21:39

very easy. We are on GitHub.com. We are in the official SAP open-source organization. So, on github.com/SAP/credential-digger you will find the code, you will find all the documentation, the videos, and the way how you can contribute. And if you want to contribute to this project, you will get a free t-shirt with a Credential Digger logo.

Karsten

22:07

Wow.

22:08

All right. Is that like this miner or something on there?

Slim

22:11

Exactly.

22:12

Yeah.

Karsten

22:15

I was just

22:15

thinking the old digger imagery is really from mining in the 1900s.

Slim

22:22

Yeah,

22:22

exactly. I'm a fan of basketball, NBA, and then there is the Denver Nuggets. And the thing is very similar to their logo. That was a surprise for me.

Karsten

22:35

All right.

22:36

I was directly thinking of that, because my original line of study is actually geology. So I can totally relate to that as well. But let's not get diverted all too much. Last one is always: If there are three short things, key takeaways, you want everyone to remember from this, what would they be?

Slim

22:58

Okay so,

22:59

first of all, in fact, attackers are lazy people. Lazy people means that they will not spend a lot of efforts in order to attack a system or an organization. The best and easiest way to access a system is to get the keys of the system. It's like for a person who is leaving their key house in front of the door, and of course the criminal will get the keys, and try and access very easy to the house.

23:32

So, keeping and writing hard coding secrets in the source code is the easiest and the most used way for an attacker to get access to the system. It happened very frequently, unfortunately, for diverse reasons. Especially for small companies, small organizations, that are not applying the security standard. So, make efforts to not even write any secret, even for a prototype, in your source code. And to be sure, at 100%, it's very easy to run a scan.

24:16

I don't only talk about Credential Digger but any secret scan that are now automated, very few resource consuming, and it's part of the development pipeline. So for me, this is the most important thing because leaving secrets is the best way for an attacker to access to all the systems. And the result of leaving one password can be really devastating.

24:45

We have many examples of big companies that leaked millions and millions of personal data, of credit card numbers, related to just one password left on GitHub.

Karsten

24:58

Okay, so

24:59

the main thing is lock your bikes, lock your houses, lock your cars, and don't put the keys on the front tire.

Slim

25:06

Exactly.

Karsten

25:06

Yeah.

25:07

All right. Great. Thanks for that one. Thank you very much Slim, for being our guest today. It was nice to have you here.

Slim

25:15

Thank you

25:16

for the invitation. I really appreciate it. Talking with you, and sharing with you my experience.

Karsten

25:22

All right.

25:23

And thank you for listening, out there everyone, to The Open Source Way. If you enjoyed this episode, please share it. And don't miss our next episode published every last Wednesday of the month. You'll find us on openSAP and in all those places where you usually find your other podcasts like Apple Podcasts, Spotify and the likes. Thanks again and bye bye.

Slim

25:45

Thank

25:45

you. Bye.

Transcript source: Provided by creator in RSS feed: download file

Credential Digger – detecting leaked secrets on GitHub

Episode description

Transcript