Github Advanced Security with Jacob DePriest - podcast episode cover

Github Advanced Security with Jacob DePriest

Apr 11, 202430 minEp. 940
--:--
--:--
Listen in podcast apps:
Metacast
Spotify
Youtube
RSS

Episode description

Jacob DePriest is the Deputy Chief Security Officer at GitHub! From discussing the challenges of maintaining the security of one of the world’s largest code repositories to sharing insights on the latest cybersecurity trends, Jacob talks to Scott about what it takes to safeguard GitHub and its millions of users.

 

Whether you’re a developer, a cybersecurity enthusiast, or just curious about how GitHub keeps your code safe, this episode is a must-listen. Tune in to gain a unique perspective on security from the heart of GitHub itself. 

Transcript

Hey friends, I'm Scott Hanselman and this is another episode of Hanselminutes. Today I'm chatting with Jacob DePriest. He's the Deputy Chief Security Officer Github. How are you, sir? I'm doing well. Thanks for having me today. So I always look at people's linkedins, which is the standard thing you do when you fill up 47 tabs with questions for an interview like this. And not everyone when you scroll on their

LinkedIn spent 15 years at the NSA of all the three letter acronym agencies. That is the one that I am most mysterious and shrouded in mystery. How has that experience? It was great. I did a bunch of different things there over the years from software-defined radio to digging into open source and DevX projects and even running some kind of large-scale IT and security projects. I think the fun thing about my time there, particularly at the end,

is I actually pivoted and was more open. I was working on open source activities and partnering with other agencies and trying to figure out how to help developers get more active in the open source community from contributions and releasing projects. And so it gave me an opportunity to be a less mysterious face on behalf of the agency. Yeah, you were like the senior OSS evangelist and

I'm trying to get my head around how you would even pull that off. I mean, I just assume if you worked for a three-letter acronym agency, then you would just not even tell your neighbors, right? Like we see the TV shows and what do you do? I work for the government. That's all I can tell you. But you're like, no, I'm evangelizing open source for the NSA. How did you connect the two? Yeah, I mean, I think it starts as many of our stories do of just following a passion, following

an interest. We're working on a project and we thought it would be great to open source it because we wanted to partner with universities in the military and other places that didn't necessarily have access to the internal networks we were working on. And that spun a 18-month effort to release a million lines of code. And I learned a lot in that process and wanted to make it better for other developers. And so really just kind of leaned into that and spent some time looking into it

and talking to folks and figuring out how we do more of it. So what's cool about that is that it fits nicely into why you would come to GitHub. You basically became so open and so into open source that passion led you into doing open source of GitHub. Yeah, exactly. I couldn't think of a

better place to come after my time in the federal government. And the fun thing is to, in some of the other things that I was involved in at the agency, I really got to understand and think about risk at very high levels as it may impact nations or critical infrastructure and things like that. And so combining the leadership training and the risk training I got at the feet of some very amazing leaders there with my open source passion, working on security. GitHub has been a really

great and natural next step for me. And I think actually we may not have met in person, but we were both, I think, at all things open in 2020 in North Carolina. Yeah. And you were talking about DevOps. So where is DevOps fit into your life? Like I know that the name DevOps is itself new. We used to call them build servers or SDLC. But now it's like DevOps is baked into the zeitgeist. When it

opens source and DevOps kind of merge for you. It was again, one of those natural progressions where I was working on open source and the processes and legalities of how to do that better inside of federal agency. And it became evident that that was only one piece of the puzzle. And so without a consistent developer experience platform for the developers inside the agency, the open source piece was more difficult to achieve. And so a group of us actually two or three of us got together

and essentially kind of did a startup inside the government. We put a pitch deck together. We got funding. We put all the kind of like pros and cons together and we started a program called DevEx that ended up kind of being responsible for all the DevOps pipelines, developer security, collaboration tools and kind of all the things that you would need to be successful for an agency

of 40,000 plus folks. Is it fair to say that even in 2024, like there are companies that are just probably building on someone's laptop and then don't have any kind of mature DevOps slash build servers slash even DevSecOps kind of a practice within their organizations?

I think that's true. I think we also see a lot of fragmented approaches where there's 10 or 20 instances of get servers and things are under people's desks and codes spread around enterprises and they're not getting the benefits of sort of the central collaboration, tooling and security honestly that comes with some of the things that you can do when you pull those things together and take the burden off of individual developers. Yeah, the under someone's desk like

totally resonates with me like this so many times. It's a story that I tell a lot but like long story short my blog I thought was a virtual machine for many years and then when it finally died and I called support it turns out they had never moved it into a VM and it had always been under someone's desk. It was literally like a mini tower under someone's desk and they were like well you know

we're getting around to it it was like a holster service. They never imaged it they never backed it up so I ftp'd in did it frantic backup and you know the little spinning hard drive finally just pooped out at the end but I can visualize it under someone's desk. Never been more true that the cloud is just someone else's computer then. But is someone else's well run computer right? And that's the thing I think people don't like I joke about that too it is someone else's computer

but it's a best practice it's like a we are figuring out those best practices as a community. Yeah absolutely absolutely. You mentioned one of the things when you were talking about like listing out the kind of the pipeline of all the things for regular Jo's and Jains like myself that have like maybe a small business like this podcast. I've got the website I've got to build in GitHub actions. I call it I keep calling it a build server. My dev ops is effectively I check it into

GitHub and magic pops out the other end. That's kind of as sophisticated as it gets but you mentioned things like governance and security and like the supply chain should a regular developer like me a regular Jo or Jane be thinking about that level of complexity within their own dev ops pipelines. I mean you're asking the security person this question so I'm probably going to say yes to some

degree. I think the important part here is that it's yes only after the core things are done. So you know for the fitness folks after I've mentioned this before like don't don't ever skip leg day like I think for me that kind of core thing and the security translation is things like two FA is making sure the account is secure making sure that things like we don't have storage accounts that

are open to the internet and things like that. I think once all those things are done for the average developer then like leaning into some of the capabilities that you know for instance on GitHub a lot of our GitHub advanced security capabilities like Dependabot and Code Scanning and

Secret Detection are all available for free for public repos and so I think those are things that folks should turn on but for us when we think about security kind of for the product and for the community we always start with a developer account and that's why over the last 18 months or so we've

kind of worked through a campaign to turn on to FA for all the contributors on GitHub which was a huge effort to basically say like no this is an optional anymore if you're contributing actively on GitHub you have to turn on to FA and that's that's why because we just see too many things start

with passwords bragging or a breach or credential leak. Yeah I 100% all in on to FA and you know I use things like Authi but then I worry like it's becoming centralized like Authi just for example is pulling their desktop app which makes sense I probably shouldn't be to a thing with the thing

that I'm having in front of me I should have another factor but I'm like oh man like they're just going to stop caring about that there's all these authenticator and then of course the sheet that we're supposed to print out the no one prints out of their backup tokens I'm sure that there's a lot of really freaked out people who will lose access to stuff at GitHub if they don't print out

their backup keys. Yeah I think this is where I'm excited about past keys and some of the innovations that are happening there where some of the major companies and we were actually involved to a certain degree in and some of the past key establishment are getting together to pull the standardized way of doing this together and you could argue that there's maybe like more secure ways like UB keys and you know biometrics and things like that but as a step forward from username passwords and

so particularly when it can be stored in a centralized way that's secure I think I think it's a great

progression that we're excited about at GitHub. Yeah you know you mentioned UB keys and I'm sitting here and I just picked my I'm holding up my GitHub UB key that was mailed to me that I never really used because it didn't fit into my lifestyle and I know that there are people who are so excited about that level that kind of security but it just never worked for me then there was USB A and C one of

them say one of them C and I just got tired and now it sits here unused which means that I'm somehow a bad security person now. Oh definitely not I don't I think my UB key is sitting in my bags and where but because I've got all the past keys turned on and Fido and you know touch ID and so that's how I operate every day so same security just don't have to find my UB key. And that's the thing right security has to fit into our lifestyles right if it is if it is inconvenient enough then we're

just not going to use it at all. Yeah that's right we have to make it work for users and that's partly why even though we're requiring to FAA for contributors on GitHub we're not requiring you know UB keys or past keys or anything like that because the diversity of the population that uses GitHub for open source is huge and not everybody has access to a mobile phone or modern tech and kind of four things like that and so we want to balance security here with the accessibility

and an equitability for our user base. Yeah I appreciate that you call that out by the way like equitability like even the pricing is modest you know GitHub pros like four bucks and you know for a time there when before private repositories were free it was like seven bucks I mean it's not

big money to support your small projects. Yeah agreed and I think you know we're still one of the few larger SaaS providers in this realm that offer free compute as well so free actions minutes and free code spaces minutes and things like that and particularly for universities they get an even you know more attractive kind of onboarding package there which is intentional right we want to

support the educational use cases but it's great it's a great way to get started. Isn't that attractive to bad guys though like the second you say free compute then someone's going to go and start mining Bitcoin or doing something naughty and as an organization it is a giant CMS you're not only shipping binaries but you're potentially building binaries that could be evil

and then helping distribute them are people abusing releases are people abusing raw. whatever GitHub CDN and using it and are you constantly just slapping people down for doing those kind of things? Yeah it's actually a huge huge challenge for the platform with a hundred million developers using the platform and a massive amount of compute there is a lot of folks who are trying to use this for various purposes so we have in the security team actually inside GitHub we have a counter abuse

team and they are building machine learning pipelines auto detections they're working very closely with support and trust and safety so as much as possible automate remediate and kind of shut down both spam and abuse but also things like you know crypto mining and also things like hosting binaries or content that don't meet our terms of service yeah it's definitely a

challenging both engineering and scale problem across the board. So GitHub Advanced Security is a product or a collection of products it's the name for looks like you've got a friend there in the background I can hear. Now it's fine that's the kind of security that we're looking for though at the company and then someone you'll I assume that there's a bark that you'll hear that is the

I need to get up bark as a bad guy versus the I see a squirrel bark. Unfortunately they're exactly the same also friend bark is the exact same so they're not not effective as a security tool.

You may have we have my I don't have a dog but my my niece does absolutely useless as a security oh she's like oh it'll warn me if no it won't know it will let them in and introduce the and it like bring them over you know you see those tick-tocks with like the giant German Shepherd and then like the delivery guy comes in like are you doing it? indeed that is exactly my dog. I was gonna ask about GitHub Advanced Security and understand

like is it a product is it a collection of products. So GitHub Advanced Security is a couple things so one for kind of our enterprise customers it's a product that can be purchased that includes code scanning, which is our SaaS capability based on CodeQL. It includes our supply chain capabilities, which is largely dependable and dependency scanning. And then secret scanning, which includes push protection. We just recently announced that this is all being augmented by AI as well.

So code scanning now comes with things like autofix, so suggestions. So instead of just highlighting the potential vulnerability in the code and saying, hey, developer, you should remediate this. It's coming with an actual suggestion in the pull request to say, oh, and here's how we think you should remediate this. You can just click accept and move on from there.

And then in the secret scanning space, we're using AI to not just detect high confidence patterns, which is kind of where we've been to date. You know, things like Azure tokens or AWS tokens or things like that. But also lower confidence patterns like username and passwords or SSH keys or RSA keys. So that's kind of the full kind of get up advanced security. Package, most of that is also available for free on public repos on GitHub. You know, and I have to admit, I have used all of those things.

Like if there's a free thing on a public repo, I turn it on. And like dependable, I could just gush. I could do a whole show on dependable. You know, people don't realize how good it is because it's not on by default. Right. You get the vulnerability dependencies on by default. But if you're an admin on a public repository, you just go and you check and you turn it on. And then you get dependable. And it's like an employee that makes pull requests while you sleep.

And it's amazing. I gush about how awesome dependable. Yeah, it's really fantastic. And then we're seeing just a huge, I mean, secret scanning is a new or capability for us. But that can be my with push protection. We're just seeing some incredible things happening. I mean, I'll just give an example inside of GitHub. We've really worked to, you know, reduce and eliminate secrets in code and our own code base using GitHub Advanced Security.

And being able to keep it eliminated with push protection so that nothing's getting in there after we've cleaned everything up is just an absolutely amazing capability. But, you know, being able to offer that for public repos is critical because, you know, it's not a cheap thing for us to do from an infrastructure perspective. But it's the right thing to do because we take that responsibility kind of at the center of a lot of the software development ecosystem very seriously.

Yeah. And this push protection, we should explain that a little bit because like one of the number one questions on stack overflow, like the top questions is I pushed a secret into GitHub, I pushed a connection string, how do I make it go away? Yeah. I mean, that's also one of the number one questions we talk to customers about a lot as well because it's, once it's in, it's in, right? It's super expensive to remediate. It's super difficult to know. It's good.

And if you talk to any security team, they're going to say, well, like once it's there, you got to, you have to deactivate it, you have to roll it. You got to change the credential. And so push protection basically sits in between the editor and the GitHub service itself and inspects, you know, it's secure, it's encrypted, but it inspects what's coming in, looking for credential secrets in the code before it actually goes into the get system itself. And if it finds anything, it'll block it.

And then there's options depending on what you're doing. You can come back to the developer and say, do you want to override this? Or, you know, enterprises have the ability to adjust some of the responses there as well. So it's a pretty powerful tool. Yeah. And it really, this idea, these, these, SaaS, these shared access security tokens and their potential vulnerabilities. It's just a magic number. And if somebody gets it, they own you.

And the amount of like responsibility that can be assigned to one of these tokens that is just easily copy-pasted and given to someone is huge. And I understand I saw some stuff because I work at Microsoft in my day job. This is all public. There was some nation-state actor that got a hold of some tokens. And it was running around on some, you know, non-production build servers recently. And it all starts with those freaking tokens that you could paste to someone in a slack.

So you don't want to catch those. So that it makes me wonder though, as devs, how can we have them without ever seeing them? Like, I don't want it in my clipboard. Because once it gets into my clipboard, it could be given. It could be moved away. Is there a way to have secrets that we simply can't see that wouldn't even show up in GitHub?

It's a great question. You know, and I think, when I think about the proactive security space and kind of all the advances that are happening, I think this is an area that a lot of enterprises work towards through things like enterprise vaults and kind of accessing these secrets on demand through APIs, which is probably that's one of the right ways to do it.

I won't say it's the only right way to do it. But, you know, having that accessible to a normal dev is more of a challenge, I think, because it's just, it's so easy just to go grab that token and toss it in the repo and keep moving to deploy the blog or do it ever. And so I think this is where normalizing things like secret scanning and normalizing and making it clear where to put secrets in places like GitHub or, you know, cloud compute like Azure is helping,

but we can still do more, I think, as a community here. Yeah, I like the call out of like normalizing it. Like right now, when I start a new project, it usually ends up in some JSON file and that's just wrong. It should be, it's wrong by default and it should be right by default. And I need to like get that into my head so that it is normal. And of course, GitHub will catch me 99.9% of the time, or as you said, it'll warn you and say, are you sure you want to do this? This token's used for

testing. But even then, I shouldn't push it through. I should do it correctly from the beginning. Yeah, I agree. And, you know, for like that, normal dev out there at a minimum, storing it as part of the secret management capability inside GitHub is a good first step. I think once you start to scale that though, there's probably more sophisticated ways to share that across an enterprise team. So I'll also talk about the third leg of that stool. We talked about secret scanning and

dependent bot, but then code QL. It's this code analysis tool and it'll analyze your code. It'll give you quality, but it's really becoming a security tool like it's spotting like bad practices. But you have to do this across a plurality of languages, right? Like there's all kinds of, you could be doing Erlang, you could be doing Rust, you could be doing C Sharp. Code QL has to

manage all of that. It does. It does indeed. And we have, we have an amazing team who is building essentially language models for each one of the languages we support and continue to evolve them. And it's not just a once and done build either because as you know, these languages evolve all the time. There's new releases. There's new versions of Python coming out, you know, constantly. And so how do we keep up with that? I think there's kind of two angles here. One is how does the team

continue to model and create what is considered a vulnerability? We approach this a few different ways. One is we actually have researchers inside of GitHub who are doing vulnerability research on open source projects and they are actually actively contributing what they're learning back into the code QL base, which then is then made available to all of our customers and open source users,

typical devs all the way to, you know, big enterprises. And then we have started to use AI to help auto model languages faster and open source projects that are widely used so that we can actually increase the rate that we are modeling and supporting new capabilities in that program. Now I was picking random languages, but I do want to come and give a brief correction I ran. I called out Rust and Erlang, which you know, I thought about them a little bit more on the far

side of the bell curve there, but those are not supported. You have C C++ C sharp go Kotlin is in beta. You've got Swift in beta, which is pretty cool. Python Ruby, you know, what I'm struck by with this list is you've got both compiled and interpreted languages, which are two different universes that you're going to have to treat as if they're the same, but they're very different. And they're really different in terms of how we approach it. The team's making some incredible

progress to be able to tackle even the compiled languages in faster. And I'll just say like easier to set up ways because right now you often have to like integrate it into the build as you're thinking about like a Java build. The goal here is to really make this easy to turn on, easy to use and kind of easy to get to that first meaningful alert that moves the needle on the security here. And they're making a ton of progress on that. So I'm excited to see what we're

headed as a program there. The amount of like work happening to just push a hello world like the back behind the scenes work must be insane. You were talking about free compute. Like if I go and make hello.c and I check it in for me, it happens instantly. I see the file. I can set up a GitHub actions. I can make a release executable pops out and then I can start hitting raw.whatever, whatever. And I can start dropping. I can curl it immediately. But behind the scenes, you've run CoQL,

you scan for viruses. There's a whole bunch of compute that happen. And you did that on a public repository. Is this going to be sustainable? You just make it more efficient. I don't understand how that's going to be something that will be around in 10 years, but it needs to be.

It's a good question. I think some of it's about the efficiency. And I think some of it is about the fact and this is one of the things I love about GitHub is that the leadership and the company generally values this community and values the work we're doing here and is investing in that as well. And so I think it's not necessarily a question of like, can we make this cheaper to run or not for public repos? It's like, how do we do it? Is it scalable at all? Period for some of the

capabilities. And I think it gets a little harder when we talk about GPU bound AI features. But generally, it's an important part of what we're doing as a company and as a business to support the open source community and continue to lean into that responsibility. So I'll kind of over all of this, like looming over all of this is GitHub co-pilot, which is generating code. There's a large language model at its base, but large language models are tuned for different

stuff. Some are good at generating Shakespeare and Limericks and some are good at generating code. But some are just confident BSers. How do I know that co-pilot or some code language model is not going to go and generate insecure code and then run it down my pipeline, my security pipeline? Sure. So kind of starting at the left side of this, the models we're using are very much

focused and tuned towards software development and the way we're incorporating those. And so we're going to continue to work on what's the right models to use for the right situation, too. And that will evolve as the product evolves as well. But then kind of if you move a little bit to the right, then we are working with Microsoft and we've got our own filters in place as well that we've partnered with Microsoft on that do things like security filtering and toxicity

filtering. So they're looking for common MySQL injection vulnerabilities and are going to prevent those from even being suggested to developers before they even get there. Now this is early days and I think this is going to continue to evolve and I'm really excited actually to see over time if we see fewer and fewer vulnerabilities introduced in the editor to begin with because

I actually think we are. We're already seeing that now. But I think getting these filters, getting the tuning necessary after the suggestions to make sure it's relevant to the context of what the developers doing, what they're trying to do, what question they asked is something we're very focused on. And then we always say even though these suggestions are good and we're seeing huge

acceptance rates here, it's a co-pilot. So you should still run advanced security, you should still check the code, you should still make sure your builds work and do all your normal tests as you would normally. I appreciate the call out about the idea of context. I use that in my talks about not just co-pilot but AI in general is that if you walk up to someone and you say that they're going to like I joke about this, my wife and I we finish each other's sandwiches, you know, that's like maybe

it's context that we have because we've been married for 25 years. But when you say to GitHub co-pilot, hey, write me a for loop, how much context is appropriate? Does it need to know what I wrote last week? Does it need to know what I wrote earlier today? Or can it produce what I need it to produce now? Maybe it doesn't know that I'm in the middle of red teaming or maybe it doesn't know I'm in the middle of whatever context that GitHub co-pilot knows can get into an uncanny valley of creepiness

where I might feel uncomfortable that the AI knows stuff. But if it's permissive and it says follow-up questions like I don't know, not like Clippy, but you know what I mean? Like looks like you're doing some naughty things with red teaming. Would you like help versus it looks like you're really focused on security and you want your code to be extra secure. How much context should co-pilot have when

it's trying to help me? Sure, I think you know this is something we talk to customers about and I know our product and engineering teams are really focused on as well and I think some of this is going to come down to where in the product and where in the workflow that the developers engaging AI as it is. And so you know the context going into co-pilot, GitHub co-pilot from auto completion, maybe

open files in the editor that they're working on now. But if they've got chat, GitHub co-pilot chat up as well and they're asking questions, then that can tune and help it understand a bit more. And so as we kind of are building GitHub co-pilot into more and more of the GitHub platform in terms of enabling developer productivity and end, I think we're going to hopefully see that be more evidence to developers where it's getting it and how it's helping them be more productive.

I like that the evident part, like I don't want magic. I want code QL or dependable or any number of these tools and GitHub advanced security and my pipeline to let me know, hey, I found a thing. Here's why I think it's a thing and here's what I think you should do about it. Like don't just say this is a problem. Tell me why you figured it out so that I can learn and be better myself.

Sure. Yeah. No, I totally agree. And I think you know when I will pull up co-pilot chat and start asking questions, that's one of my favorite parts about it actually is it's explaining to me. Here's what we're suggesting. Here's why, hey, this is vulnerable. Hey, if you want to read more about why it's vulnerable, go click this link and it just honestly, it's a much richer experience than going to a search engine and trying to find help for the thing I'm trying to do. And it's

also way, way faster. I was editing a Jupyter Notebook the other day to like analyze some data for the teams and I needed to make some changes to it and I wasn't super familiar with the language and I was like, oh, this is going to take me hours. I'm not even going to bother with this. And I was like, oh, wait, actually, let me fire up co-pilot and see what I can do. I had it fixed in like five

minutes. It did exactly what I wanted it to do and I understand why it did it. And that was just a fun thing for me given that I don't really develop every day anymore in my day job. You know, it's interesting that you call that out though because you know that idea that there are co-workers or relatives who are really good Googlers. You know, I'll have like non-technical relative try to Google for something and they'll just keep banging their head against the wall.

And then I just, I'll notice like you used too many words, use less and then I'll find it on the first try. Or there's a particular term. Hopefully co-pilot and AI's will equalize that more so that everyone will get the answer that they want because what happens now is my non-technical relative will be like, oh, you're just a really good Googler. It's like, well, no, you know, I want, I want to teach you how to be able to do that. But I noticed that their solution is to simply restart

and try again. Yeah. Like give up on the query and phrase it differently. But with with with an AI or with a co-pilot, I don't just give up and start a new conversation. I try to refine it. I think that's a skill we're going to have to teach our non-technical brethren. I agree. I also think at least to me, it feels more intuitive to do that with an AI co-pilot. I mean, just being able to

have that context we were talking about a few minutes ago that it's already got. It already sort of knows roughly what I'm trying to do or at least what I'm seeing on my screen to a certain degree and say like, hey, can you explain this file to me? I don't have to tell it which file it is. Or can you help me rewrite this file from co-ball to Python? There's things like that where if I went and googled that, it would take me hours and hours and hours to figure out roughly the same

thing because I would have to keep doing that iterative thing. And even though I'm not quite as good as Google or as my wife is, I'm still fair at it. Very cool. Well, I think that our in conclusion, what I'm hearing is I need to make sure that I've got advanced security turned on on all of my repositories, which I can do if I'm an admin on a public repository. I can turn these on. And that'll give me dependent bot, which I already love, a code QL, which I can use and if I interact with it,

I will get nothing but good stuff. And then secret scanning and push protection is going to be fantastic as well. This is all combined within the context of GitHub Advanced Security. This is pretty cool stuff. Jacob, thanks for hanging out with me today. Thanks so much for having me. Enjoy chatting. We've been chatting with Jacob DePristh. He's the Deputy Chief Security Officer at GitHub. This has been another episode of Hansel Minutes. And we'll see you again next week.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android
Open in Metacast