CodeQL with Alvaro Munoz - podcast episode cover

CodeQL with Alvaro Munoz

Oct 24, 202254 minSeason 1Ep. 16
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Summary

Alvaro Munoz, a security researcher from GitHub Security Lab, discusses CodeQL, a powerful static analysis tool for finding security vulnerabilities. He explains its declarative nature, similar to SQL, and how it enables comprehensive code exploration and variant analysis at scale. The episode covers CodeQL's use in CI/CD pipelines, its developer-first approach to reduce false positives, and notable real-world bug discoveries, including an RCE in a COVID-19 tracing app.

Episode description

In this episode of Hacker Talk:

One of the most powerful newer static analysis tool is CodeQL.  

By converting your code base into a Codeql database, you can now write  

queries in a read-only way, in order to find security vulnerabilities   

and problems in you Code-base.


We wanted to know more about this declarative language called "CodeQL".

Straight from Github's Security Lab, we are joined by Alvaro Munoz!  

Alvaro, is a Security Researcher, Leads a team of researchers that leverage Codeql to find and model vulnerabilities at Github, with a background in research related to finding remote code execution bugs through deserialization.  


Tune in as we get to hear the ins and out of CodeQL, how to get started, when Codeql was used to find a vulnerability in a public Covid-19 system, how to find vulnerabilities with Codeql and a lot more!




Topics covered:

Learning to thing outsite the box by playing Capture the flag

CodeQL declarative languages 

Static code analysis

Getting a broad view of the source code

Writing queries with CodeQL to find vulnerabilities   

Modeling vulnerabilities with CodeQL

The learning curve of CodeQL

Quering github repositories for vulnerabilities


Write codeql for a large amount of repositories with lgtm(use it goes before it goes EOL)

Linters vs codeql

CodeQL integrated with continuous integration pipelines

Get started with Codeql

Submit your codeql queries to Github Security Lab's Bug bounty

Best practices for writing queries    

Thinking of the code as a database with codeql

Finding vulnerabilities in Covid-19 systems

Best pratices for CodeQL 

Reduce false possitives 

CodeQL with nvim(neovim)    

Improving vim by creating a more interactive development enviroment alternative, "neovim".

LSP integration with neovim.  

CodeQL with Emacs

Remote code execution bugs found with CodeQL.  

Bugs found in Radar Covid App

Patterns leading to remote code execution   

Auditing javascript frameworks

CodeQL vs other static analysis tools

Capture the flag codeql challanges

The future of CodeQL



External links:

https://lgtm.com/  

https://github.com/pwntester  

https://neovim.io/

https://en.wikipedia.org/wiki/Language_Server_Protocol    

https://en.wikipedia.org/wiki/Semgrep


Covid 19 tracing app

- https://securitylab.github.com/research/securing-the-fight-against-covid19-through-oss/

- https://threatpost.com/german-covid-19-contact-tracing-vulnerability-rce/161419/


Github Security Lab web site: https://securitylab.github.com/


Join Github Security Lab Slack Channel: 

https://join.slack.com/t/ghsecuritylab/shared_invite/zt-120w4vby8-_O9u9k2hPfgbju1tddBPcg


https://twitter.com/pwntester

Bounty program: https://securitylab.github.com/bounties/

https://codeql.github.com/

https://codeql.github.com/docs/codeql-overview/  

http://www.pwntester.com/

https://en.wikipedia.org/wiki/Abstract_syntax_tree  

https://en.wikipedia.org/wiki/Control_flow_analysis

https://github.com/github/codeql-learninglab-actions

https://github.com/anticomputer/emacs-codeql/   


Special thanks too:

We want to give a huge thanks to Github's Security Lab Team for making this episode a reality!


Transcript

Alvaro's Early Hacking Journey

Hello all listeners and welcome to this episode of Hackertall. My name is Philip and I will be your host for today's episode. What if I told you there's a new way that you can find security vulnerabilities in your code base using something called CodeQL.

Introduction to CodeQL and Guest

But what is CodeQL? How do we use it? What is the fuss about? In this episode of Hacker Talk, we're joined by Alvaro Munoz. Alvaro is a security researcher known for his work with deserialization bugs. He's a speaker at various conferences, part of GitHub security team. and our HackerRoster episode. Welcome to HackerTalk, Álvaro Munoz. How are you doing today? Thank you. Very excited to be here today. Awesome. So for all our listeners that have no idea about who you are...

From ITIL to Professional Security

How did you get into security and software? And was there any light bulb moment where you realized like, wow, security is pretty cool? If I have to go back to that moment, then I have to go back to my teenager days, I guess, back. in high school, I guess. and i was playing with computers that was back in the 90s i am old i know um but at that at those times there was no public access in internet at least in in spain where i'm from so basically i was

playing around with PBS. So if you are old enough to know about PBS, I think it stands for Bulletin Board Systems. Those were basically your... computers that you were sharing or sharing access to. with other people via modems, basically telephone lines. So you have to call into some other people's houses and basically get a response by these VBS systems. and then you were able to browse through a number of files that were served.

And then I was visiting some of these PBS for hacking information because I was kind of interested. I learned how to program in BASIC using Microsoft MSX. Back in the day, my father told me. And then I was very interested when I got my first 486 computer and then my Pentium. that was running on on windows right and i remember that i was when i wanted to learn about hacking because that sounds sounded very interesting and very cool to me back in these teenager days then

I found many information in this BBS about Unix commands and Unix operating systems, Solaris, etc. But I didn't get access to any of those systems at my home lab, which was basically this Pentium. And then... I learned how to hack myself into CompuServe and get access to internet, get more docs, more... documentation about these open systems. And finally, I was able to get some accounts in some servers in the US, I think, where I was able to play with all those commands that I...

was learning about from these manuals and tutorials that I found in. So I kept learning about programming, learning about different languages. And then basically that's where I started college and I kind of forget about this interest or I kind of switch interest into other topics like music. guitars, something like that. So back when I was finishing college, I got this interest back again and kind of started my work.

related with computers, but it was not related with security at all. I was working in ITIL world and then at some point I learned that The company I was working for, that was HP at that time, was acquiring a security company called Fortify. And then that was my chance to jump into Fortify and start working with static analysis.

and application security at like a professional level and that's where i guess that everything like started more seriously i guess oh that's cool yeah no i remember that back in the day when i started college i had this documentation about that i learned from this bbs and i knew about things like john the crack and then the repersory and and then i used that

in the first year of university to get access to. I don't know if I'm supposed to talk about this, but let's say that I was able to get access to the labs and play with those systems that I didn't have access to in my home. But yeah, so that was like the starting of my career at Infosec. And then I've never broke the law, I guess, except maybe for that specific episode.

Importance of CTFs and Lateral Thinking

But then I started playing CTFs and I think that CTFs has been a really important point in developing my skills, learning about security. and really becoming very interested in security. And I started playing CTFs by myself. At some point, I was contacted by, at that time, the most important Spanish.

ctf team the intrepids that were always at the top of the ctf score leaderboard so they were like very good and they are still very good and i learned a lot with them so Before joining the Intrepids, I was basically doing like all kinds of security related challenges in the CTFs, from exploitation to reverse engineering to web crypto, everything.

When I joined Intrepid, I guess that I specialized in web and like high level languages. And then that's where I basically developed my interest and my specialization. And that's why. basically end up working. That's dope. That's dope. Do you still do STTFs? Not that much since I got two kids and they basically took all the free time that I used to have. So I don't know.

Maybe one day when they are older, I will be able to have like free time again. But so far, it seems like a dream to me to be able to have like a whole life. Hopefully in the future, I will be able to.

That's, that's so fun. Yeah, CTS are so fun, but sometimes you spend hours and hours and hours on the challenges sometimes. Yeah, I always recommend people approaching to me and asking how should I... um get started with infosec or application security and i don't know that depends on on the people on the specific person for example the way that they learn or the way that they get motivated

But for me, what worked was CTFs because I really learned how to think out of the box, how this lateral thinking that they call. And that's what... got me so many vulnerabilities in years after, like being able to think in not a straightforward way, trying to... you know yeah think out of the box and and find these malicious kind of mental models where you think how to break things other than

building things, they're trying to develop this kind of... Yeah, exactly. Yeah, that's very important to have because it makes your code completely different. You're so defensive now when you're writing code, I don't feel like.

CodeQL: A Powerful Static Analysis Tool

How does all these roles lead to CodeQL and what is CodeQL? Well, basically, as I said, I started working in Fortify, which is also a static analysis vendor. And at some point in my career, I decided to move to move to GitHub. And there I was going to basically do security reset. So not really bound to any tool at all, like CodeQL or anything like that. But I guess that as we say in Spain, I don't know if there is any.

similar saying in English, but the goat goes to the mountains. So if you really like something, just try to do it. And that's why I started playing with CodeQL a lot within GitHub. And today I end up leading a team of junior security researchers which are using CodeQL to find vulnerabilities in open source software at scale.

So, yeah, back to your question, what is CodeQL? CodeQL is a static analysis tool or solution, which will basically help you to find vulnerabilities by analyzing the source code at rest, if you want to put it that way. We are not running... the source code at any moment, which from one side or one point of view, it will give you visibility into the whole source code.

when you use something like black box approach like running tools like verb or any dynamic analysis tools you will be able to find vulnerabilities in the code that you are able to exercise with your requests or whatever But there will be many vulnerabilities that you may not exercise because you didn't trigger correctly. And you want to be able to trigger them if you don't have visibility in the source code. If you don't see that the source code is checking for this.

query parameter and if it's present then it's going to do this or that and then you won't be able to trigger that dynamically so i always found like a static analysis more complete in that in that sense in the sense that you have full visibility and sometimes you even find bugs that then you are not able to treat it because there may be bugs in some... places in the code which cannot be exercised from the outside, from the attack surface.

But yeah, I guess that that's the upside of static analysis, the visibility into the whole code base. And the downside is that it's static. So basically, you are not running the code. You have to emulate.

Static vs. Dynamic Analysis Benefits

what the code could be doing if it would be run. So in some places, in some languages like Java, which is a statically typed language, this is kind of a pretty accurate analysis in some other languages, such as JavaScript, maybe, or Python. uh you have to do to do a lot of assumptions about what's the type of this object um in java you always know the type but for example in python you have to do assumptions and based on those assumptions you will have like better results or worse results

And yeah, that's basically CodeQL is a platform to perform static analysis. It's used a lot to do what we call variant analysis. So basically... Imagine you are a security engineer. You receive a report from a external security researcher telling you that they found a vulnerability in your code. And then you can take that code, understand what was the vulnerability, and then model that with CodeQL. Model that means that express the same pattern that this vulnerability is defined for.

And then express that in CodeQL, so you will be able to run that query and find similar patterns appearing in the very same codebase. So variants of that other bug. Or once that you have that... back pattern model, you can run that at scale on like hundreds of thousands of repos and then be able to quickly scale and find more abilities in thousands of repos.

CodeQL as a Declarative Language

So you got to be good at writing queries for the thing. Yeah. And this can be a little bit not problematic, but CodeQL is a declarative language. So people is more used to work with... uh imperative languages so languages were to define how you want things to happen right in if you think about java python javascript all of those are imperative languages where you tell the program what you want to do

like loop through all these items in this array and this for each of them do this or that check if they start with this prefix and then if they start to this or that that's like following a receipt And a declarative language is completely the opposite. You just specify what you want instead of the how. And then the compilation engines will basically just figure out how to connect.

database tables or how to get the information that you want and if you think about a declarative language you will probably come up with sql where you say like okay i want to uh all the users where the username is i don't know philip and the last name is whatever and the age is uh whatever and then you will get those results and then you don't how you don't need to tell

this SQL engine, how to join those tables, how to get the information from Slack magic for you. Exactly. You will just get a list of the users and you will just express.

in a very expressive language like select uh from users like select the username the age and the address and then it may connect three different um tables or whatever it's implemented in the back end of the database and it will get the results for you and code ql is exactly that you don't say you don't tell the engines how to get you the the stuff that you are after but actually you tell them like okay i want all the method

calls to a method called execute, for example, or eval, where the argument to this method comes from untrusted data, from data that can be controlled by a user. And that's it. Just return me all the places in the code that are called eval with untrusted data. And then CodeQL will return you all those places. So you can use CodeQL to find vulnerabilities. Like, for example...

Exploring Codebases with CodeQL

modeling a vulnerability like a sql injection like for example or you can also use code ql to explore a code base if you are into code audits and code review then if you are faced with like 10 000 lines of code then you don't even know where to start where to start looking and for vulnerabilities but with code ql you can start like asking questions to the um to the code base right like what are the places where untrusted data enters this application where what are the places where

I don't know, deserialization operations are performed. No matter if we have evidences that the data being deserialized is coming from untrusted data or not, I want to get all these places where... untrusted data has been, or sorry, data has been deserialized, or all the places where IO operations are performed, like files have been opened, read, or written to all the places.

And then you can ask questions about the code. And then as a manual code reviewer, you can use that information to drive your audits and be like more effective, more... That's really good.

CodeQL Learning Curve and Scale

Overview. I think it's a really cool approach to it. And being able to find, like you said, a wall is a very, very bad feature. How long have you been in CodeQL? Well, I joined GitHub like almost three years ago.

january will be the three years and i started like really playing with code qli maybe two years ago and yeah i can tell that even though the learning curve is a little bit steep um ones that you get to the basics of working with a declarative language, which you may not be used to, and ones that you get used to working with expressing these patterns in a declarative language.

then it's much easier than you may expect. So I think that, yeah, you need to stick around for the time that it takes to learn the basics and then you will get rewarded with a very powerful tool to find bugs, to model other people's bugs. I have done this a lot of times, like someone reports a CPE.

And then during the weekend, then Monday, I come back to work and then I model that CV with a code QL query. I run it on all GitHub repos and I find like many vulnerabilities that we... report to the maintainers and that gives you a really powerful tool i mean i can't imagine another tool that runs at this scale like dynamic analysis is great for

when you're focusing on a single application. But if you want to run that at scale, like thousands of thousands of applications, just thinking about having those thousands of applications up and running, like setting them up, and that's... something that is even impossible. And with static analysis, with, for example, languages like an interpreted language, you don't even need to compile them, like Python, JavaScript, Ruby.

You can just throw them to the CodeQL extractor and that will basically take the source code. It will extract what is called the AST, the Abstract Syntax Tree, which is basically an abstract representation of the source code. And that will store that AST in a CodeQL database. And then you are ready to query that database with whatever queries you want to run on them. Okay. So if I want to do this on a local...

On a local repository I can use, I can set up CodeQL with a repo. But what if I want to like, what if I write a new cool exploit and I want search entire GitHub for this.

LGTM.com and Community Engagement

how can i as a like a consumer or private person do that um so this is still possible in lgtm.com which is lgtm.com well lgtm.com is a platform unfortunately is going to be deprecated so if you want to play with lgtm.com you have to rush basically the idea is that in lgtm.com you have a code ql

Query Console, so you can just go to this site, go to the Query Console and start writing CodeQL and then running those queries on all the hundreds of thousands of open source repos that we have onboarded in LGTM. And then there are some scripts out there. So if you are interested in getting access to these scripts to run these queries.

at scale on these hundreds of repos then you have to visit our slack channel that i will provide you with a link later because i don't have it right now but it's it's it's private private channel but it's open to anyone that wants to join, basically. Anyone that asks for an invitation, they get the invitation. And then we basically discuss things about CodeQL. You can ask any question and talk directly with the CodeQL engineers with the security.

uh lab researchers that use code ql so different perspective the people that create and write code ql and the people that use code ql to find security vulnerabilities and just ask your questions etc that's awesome that's really useful

CodeQL Adoption and Accuracy

to have a community like that. How is the user adoption of CodeQL? Is there mostly bug people that use it for bug bounties or... So CodeQL has been adopted... mostly in the form of code scanning. So code scanning is a feature that you can enable in your open source repos in GitHub, where... Basically, every time that you submit a commit or a PR in your repo, then it will be automatically scanned by CodeQL.

Right. So in this way, if you introduce a security vulnerability in your source code, you will be notified as soon or you can even make. like a check and make these pr fails if they have like a code or code scanning vulnerabilities so i would say that that is the most adopted use of code ql by developers integrated into the CI's and CD pipelines. Obviously there are many security researchers that are using CodeQL.

for looking for vulnerabilities like we do at Security Lab. But if you look at the numbers of developers using CodeQL integrated into their CI-CD pipelines versus the security researchers...

I would say that probably developers are using it more than security receptors. And that's a good thing because mostly static analysis are known to be... a little bit noisy in terms of false positives but CodeQL was developed with this developer first approach where they prefer to have a false negative so not to report something over like flooding the developers with false positives. So if with any static analysis tool, you have to move the needle between these two, like being very...

accurate and just report things that you know are security vulnerabilities for sure. Or if you move the needle to what I call the researcher mode, you start getting... more false positives and also less false negatives. And as a security researcher, I don't mind like triaging and reviewing 100 false positives if one of them is a true positive. Right. For example, if I have to review in log4j a hundred false positives and one of them is the log4j issue like the true positive.

then that's time well spent for a security researcher. Now, from a developer point of view, if you receive a report and it's a hundredfold positive, the next thing that you're going to do is basically disable that tool. and not run it ever again so um this is the reason that when running codeql the default mode is running in a in a developer first mode where it's very accurate and try to get

as more accurate results as possible. Now, as a researcher, this is a completely flexible tool. You can enable this research mode by making the analysis maybe less accurate, but...

CodeQL vs. Linters and AST Grepping

also will result in less false negatives. That's true, that's true. Because yeah, you're touching on important topics here because I feel like a lot of developers, they get this type of static analysis tools and then if there's too many post posts, they just turn them off. in the CI. What would you say? I know we have a lot of listeners that have, they've tried to see similar stuff as CodeQL does in their CI system, but they're using like...

all these types of static linters and ways of describing bad patterns in the code. Do you think they should just get rid of all the old linters and replace their CI with CodeQL? Well, linters may be used for a number of reasons, like not just security, also quality, following some policies about conventions in your company or whatever.

So if you're using linters for that, then obviously CodeQL may be used for that, but then you have to write all the queries for yourself. So the queries that comes bundled with CodeQL are security. related and some quality related queries, right? If you use lintering for things like, you know, having some convention about, well, how to...

If using single quotes or double quotes or things like that, then that's something that CodeQL is not going to do. Now, if you think about static analysis, there is like a large spectrum. where you can move from very simple AST gripping, right? So basically, well, you can even go farther in this spectrum to just grip things. But this is not...

useful because it will get a lot of false positives. Then you can grab the AST, which is basically to extract this AST and then you grab things on the AST. Using this approach, you... are more accurate. Like, for example, if you are importing a library in Python with a different name, so import foo as bar, then bar, you assign it to a variable, and then you run a method on that variable.

then a grep, a simple grep, won't find that you are invoking this method on this library that you imported. But if you use an AST-based greper, then it can... find these kind of calls, right? Now, in the other side of the spectrum, we have like full program analysis, static analysis solutions like CodeQL, which are able to perform things like data flow analysis, so following.

and user control data through the application as if you were running the application they can perform control flow analysis they can perform different layers of abstraction to the code and performs really accurate and precise analysis Or you can also run things like CodeQL in a very simple way, like matching the AST. So it's up to you. Like CodeQL, you can think of a Swiss knife.

where you can basically use the most powerful tools or you can just basically if you want to find all the calls to a given method and just rep the hast for that then you can just write a two lines query that will basically do that and this will be really fast, and this will integrate very well with CI pipelines. Now, if you want to go to the opposite side of the spectrum, with CodeQL, you have the possibility to go all the way.

down to the both sides of the both ends of the spectrum. And you can go to the full program analysis where you will be able to run like really powerful data flow engines. and then tracking engines on your code to be able to find this kind of injection issues where you have untrusted data entering your applications in one place and then following down through multiple libraries, interprocedural.

calls until they reach a sink or a place where some dangerous operation happens using this untrusted data.

Getting Started and Bug Bounty Program

If someone wants to roll up their sleeves and start writing queries, is there any repo or good queries that they can start with as an example? all the rip all the queries are open source you can just go to github.com slash github slash code ql and you will find the other queries are open source so you can have you can find the source code for all the queries for all the languages so we support like i think up to nine different languages um and also if you want to start writing code ql queries

There are like a bunch of learning labs in GitHub, so you can just basically search for GitHub Learning Lab CodeQL and you will have a bunch of them for different languages, like for example... JavaScript, Java, CC++. And these learning labs are based on finding real vulnerabilities that we found in the security lab. And then we created this learning lab about how to start from scratch, not knowing...

or not even having CodeQL installed, installing it, learning the basics, and then finding a real security vulnerability using CodeQL. So those are really good resources to get started with CodeQL. Something that also, if you want to get started in this PodQL journey, if you are looking for a motivation, an economic motivation, then we are running this Backbunty program.

Where if you submit a query that finds bugs in open source projects like or is covering a vulnerability category that is not already covered by the default rules or the default queries.

then you can submit that to our backpunty program and then get rewarded from I think like 1000 or a dollar to maybe up to five six thousand dollars depending on the scope of the query on the complexity of the query etc but i think this is a very good motivation for people to learn code ql use it for their own research

but also contribute those queries back to the community so all the people that is running CodeQL in open source projects can run those queries as well and then get rewarded for that. Yeah, that's awesome.

Best Practices for CodeQL Queries

having that library of queries and just being able to lurk around there. Do you have any best practices for writing queries or any mindset that you use for writing queries to give maybe less false positives?

So I have some mindset about thinking about the code in the way of a database. Like, for example, imagine that you have... database with all the constructs in the code so you have a table for all the method definitions another table with all the method calls another table with all the variables another one with all the variable accesses And then you're basically just joining and querying those tables. This kind of helps you getting started with this declarative language with CodeQL.

In terms of reducing false positives, this is more related with starting with a broad query, like, for example, finding all untrusted data flowing from any untrusted source. to, let's say, for example, some code execution things, right? Then if you start with that, you will find probably all the true positives, but you will also have some false positives.

because the code may go through some sanitizer notes in the data flow graph. So some places in the code where maybe the untrusted data is checked. to be an integer, for example, or maybe it's checked against an allow list and it's only allowed to be one of certain values, then you may want to write some sanitizers.

to improve the accuracy of the query and then make it like more precise right so if you start like writing sanitizers then you will start removing false positives um then start like pretty road but in general with the easy query like from source to sync and then if you start getting false positives then you can just refine the query and add sanitizers to exclude certain paths that flow from the source to the sink, which may sanitize the data provided by the user.

Alvaro's Custom CodeQL Setup

How does your current setup look like? How do you work with CodeQL? What IDE do you use with it? How's your setup? completely different from, I would say, 99% of the people using CodeQL. So I'm not a good example of that. But I'm really, like I said at the beginning of the talk, I'm a kind of old person and I'm really into BIM and NeoBIM to be more specific. And I developed my own NeoBIM plugin to work with CodeQL. So the way or the IDE that I use.

to work with CodeQL is NeoVim. So basically I just use the terminal and NeoVim and I write tests and develop everything. Why NeoVim over Vim? That's a good question. So at some point NeoVim was like a fork of Vim.

where they started developing things that Brad, which is the main maintainer of Beam, was not accepting as pull requests. Things like being able to run async jobs in NeoBeam without blocking the whole... editor so in the past uh if you were running like a linter like for example you said you mentioned the linters you had to wait for this was the same process right so it was blocked until the linter finished and then you had your editor like blocked

and you were not able to type anything until that finished. So, Thiago Forte, I think is the name of the founder, if you want to put it this way, of the main maintainer of NeoBeam at that point in time. I think he says he's no longer working actively in this repo. But at this point, he submitted some pull requests to Veeam that were not accepted. So he decided to just fork it and create a forward. a more modern version of BIM.

could be possible and then from there people started contributing crazy ideas which were at that time not accepted so i think that vim has changed their mindset in the last year so they are now able to run async jobs and other crazy things that are being implemented in NeoVim, like, for example, three-seater support for better highlighting, for better movements around the code, etc. They are also being ported back or implemented in Vim.

But now, these days, it's Veeam playing the catch-up game with NeoVeeam. So NeoVeeam is ahead, like, developing all these crazy ideas. Like, for example... lsp integration like language server protocol integration to make vim a really id and not just like a text editor and you know soon after Beam implements that as well. But NeoBeam is more like... That's amazing. I use that for Rust when I write Rust code all the time. To do it remotely. It's awesome.

Yeah, so think about writing this thing with an LSP server, with all the lintering, et cetera, that LSP provides. And with CodeQL, going back to the question.

The official ID is VS Code, and you have the official and supported CodeQL extension that will provide just the means to run the queries, to browse through the results, etc. also an lsp server but because of the way that they implemented this codeql extension with an lsp server like a query server etc you can just write your plugin for a different text editor or a different id

and for example what i did is basically just to use the ld code ql lsp server that is already developed and is connected to the field and i just can't call it from neovim without make writing like a lot of lines of code. It's very easy to integrate into NeoVient. Better just using the API almost. Yeah, I'm basically using the API for the query server, which is in charge of running the query and getting you the results, and the API for the LSP server, which will provide you with...

Errors, warnings, hints, things about your query so you can get feedback for your query. Cool, cool. And this is available on GitHub, right? Yeah, so the official extension, as I said, is VS Code with the VS Code extension. That's what people should be using unless they are really crazy into Veeam or Emacs because there is another guy in my... my team that is the same crazy as i am with bim but with emacs and he wrote his code ql plugin for for emacs using a lot of parentheses i guess

But yes, you have now those three choices. You can also use the CLI because you can basically just write your query in Notepad if you want and then just run it using the CLI. where you basically define, okay, I want to run this query on this database and just run it and you get the result as JSON or as CSV or as a SATI file, which is a format to...

Dump the static analysis results, which is standard these days. And then you can just open those files with Notepad if you want. What have you found so far with CodeQL?

Major CodeQL Vulnerability Finds

If you've been doing it for two years, you must have found a lot of bugs. But do you have some highlights on some interesting bugs that you found during the way? I have found many, many bugs. I focused on analyzing Java and also... vulnerability categories leading to remote code execution. So I have found, I would say hundreds of them. I really like one which I found back in the COVID days, 2020, probably.

a few months after the pandemic started basically so if you remember those days there was this These, we call it Radar COVID, which are these applications in your mobile phone. So if you got exposed to other people that were infected, you get a notification in your phones, right?

using the Android and the iOS notification and exposure systems, basically, if you notified that you were sick and then that was... uploaded to a database basically and then if anyone was around you for more than 15 minutes in closer than i don't know 20 meters then you got a notification that you were exposed to the virus

Right. Remember these applications? They were implemented in all countries. I think in Spain, it was called RadarCovid. But I found that the server for the German implementation of this system, the backend... was vulnerable to remote code execution and the way that i found it is well first of all i found a new pattern leading to remote execution that was like a novel thing so if you were using hibernate validators

And this was kind of funny because you were validating your data, but because you were validating your data in a certain way, you were actually making your application vulnerable to remote code execution. You can look for that. Kind of the official name is like VIN validation. But yeah, I can even provide you with a link to this article that we wrote.

And then this application was validating the data that you could submit to the radar COVID systems. And then they were validating it, making sure that it was following certain constraints and limitations. But because they were doing that in an insecure way, you were able to trigger and get the remote code execution on that German server that was... You could basically disrupt this certification system for the full Germany.

for the whole Germany. So that was a little bit crazy. But SAP were like the people maintaining and developing most of the code, even though it was open source, because the whole idea was to build trust on these applications. because there were like certain concerns at that time that people may get spied because they were installing things on their mobile phones and they were going to be able to know where you were.

where to where like base or where to where located etc things like that but uh those applications were made open source so everyone could see that the application they are the both the mobile application on the both end were not taking any privacy information any sensitive information etc this application was developed by sap and they were very quick to fix the issue and

and publish the fix. So yeah, back to the CodeQL, I found this by modeling a pattern that I found myself. So I found it myself in Nexus, which is a Sonatype artifact repository. And then I run it on Nexus. I found a couple of instances there. Then I report it to Nexus, Sonatype. Then I wrote a model. Talking back to this variant analysis concept, I modeled this pattern using a CodeQL query. very simple one maybe 20 lines of code and then i run it

On Nextus, I found other one that I missed. So I reported that one as well. And then I run it on open source projects. At that time, there were like 20,000 Java projects. And then I found it. In many projects, we reported the issue and one of them was this COVID radar system. I don't remember the name for the German implementation.

But yeah, I would say that that one was one of the most interesting because of the impact and the criticality at that time, at that point in time with COVID. That's awesome. Oh, yeah, that's pretty. How fast did they patch it? It took them like a couple of days. Yeah, no more than 48 hours, I guess. So they were very, very, very quick with that.

Cool. So I guess you look at a lot of security vulnerabilities that get released all the time. What is the latest, most interesting security vulnerability that you read? Sorry, say that again? I guess you look at a lot of security vulnerabilities all the time and try to write queries for it. What's the most latest interesting vulnerability that you read that you thought like, wow, this is a really smart, innovative way they found her.

So these days I'm more into managing the team that's writing CodeQL Queries myself, but I've been, as I said previously,

During the weekend, maybe I read like a couple of blog posts, read-ups about vulnerabilities that are being disclosed. Like, for example, the other day, this Cobalt Strike vulnerability in remote code execution by some clever injection in the... um stml code that is being rendered in the shrink components of java and then i the monday after i came back to work and i wrote a query for that

and found a couple of instances of other applications that were portable. This was like a couple of weeks ago, so we are still in the process of reporting the issue. But I think that was the latest one. Other than that, I've been also modeling some JavaScript frameworks that we use internally at GitHub, but they are also open source. So these queries are going to be contributed back to the community. And I've been...

surprised, if you want to put it this way, by the power, how powerful a code QL is. Because as I said, I've been using other static analysis solutions in the past.

With CodeQL being a declarative language, I thought that it was not going to be possible to do things that it was not even possible in other solutions. So it was like out of the possibilities of static analysis. But I... wanted to implement that so i asked the engineers and they told me yeah sure you can do that and they they taught me how to do it and i was like a mind-blowing moment oh

Wow, if you can do this with CodeQL, you can basically model anything. You can even develop your own abstraction layers on top of the abstraction layers that are already provided, like the data flows, the control flows. and you can so there is no real limit of what you can do with CodeQL basically you really need to get into it because it's like more advanced concept but if you get into CodeQL

you can get for granted that you will be able to model almost anything, any pattern that you may think about. I feel like cultural is just this new, amazing new static analysis tool. Is there any tool that you would like compare it to?

CodeQL vs. Semgrep Comparison

Or any framework? Well, you can compare it with any other of the static analysis solutions out there. I think they serve different purposes. Like, for example, I'm seeing a lot of traction with SEMGREP these days. And I think ZemGrep is a great tool as what they call a guardrails enforcer. So basically, you want to make sure like...

Maybe not detect a complex SQL injection flowing from this specific user control parameter in through like multiple layers of abstraction in the procedure called calling. back and forth from library code, etc. until this untrusted data reaches a sync. I don't think that SEMREP is apt to try to find these kind of vulnerabilities. but they are very good at enforcing things or policies like don't use, for example,

There is this flow through very complex code and CodeQL can get you the whole path from the source to the sync. Semgrep is more about don't use that sync from the start.

they find that you're using this sync, then they don't care really about if there is an evidence of untrusted data flowing into that sync. Maybe there's a policy that you should not use this sync at all in your code. So it's like a... a guardrail that protects you and keeps you in the road, but also a different approach for static analysis is more located in the other end of the spectrum that we were commenting on before.

like repping in the AST and using that to enforce like good practices. Also, I think that CodeQL as a recent rep doesn't require you to learn like a new language like... CodeQL is a completely new language, a declarative language that you need to learn. And with SEMGREP, I think I haven't really used SEMGREP a lot, but you can just write your queries on the same language that you're querying. So if you are querying...

PHP, you can just write some symbol queries using PHP-ish code, right? And that's kind of make it more accessible or easy to use for people that... are not willing to invest into learning like a new language. Even though the potential of those languages is completely different and what you can do with CodeQL and what you can do with Zengreb, but what you can do with Zengreb at the cost of not learning a new language.

Surprising CodeQL Learning Speed

may be enough for some people. How steep would you say the learning curve is for CodeQL? Is it like, hey, I can roll up my sleeves and I can do it in a weekend? like two weeks of research and so and i thought that the learning curve was very steep um that was my impression um but What I learned the other day was we were having a conversation with someone that ran a CTF based on CodeQL in their university.

So they got like 30 participants and they had to solve like 10 different challenges using CodeQL. And these participants hadn't used CodeQL at all before. And then... After three hours of competition, they were able to solve 90% of the challenges. Wow. So these are three hours they were able to... There are like a couple of tutorials for CodeQL that...

So there are the learning labs that I mentioned before about real-world vulnerabilities found with CodeQL that you are guided to find using CodeQL. So there are these other tutorials that are called the detective tutorials, where you use CodeQL. to um as a detective to solve some some frames and some mysteries but they are not even related with a programming language you are using mostly a pure .ql like trying to figure out

and follow hints and solve some things. Some crimes is more like a game. And they solve the three tutorials that are available, plus a number of vulnerabilities, model some vulnerabilities in CodeQL. in three hours so i was really surprised so these are students from a computer science course in germany 30 of them i don't know if it's a good amount to to draw some

some conclusions, but I was surprised because I thought the learning curve was steeper than that. So I was... That's pretty incredible. Three hours. Yeah, so I know these people are used to CTF games, so they are into security, they are into programming languages, but they were not into CodeQL. And they were able to grasp the basics and actually model a couple of reliability patterns like missing hostname verification.

when checking an SSL connection, for example, and other vulnerability categories in just three hours. I don't know, maybe myself, I was not able to do that back in my college days, but these new generations are smarter than we were. They're going crazy. Yeah. Awesome.

CodeQL Community and Future

I think we've gone through most of all my questions. Is there anything I have forgotten to ask about CodeQL that you think is... No, I just want to take the opportunity to invite everyone that is interested into joining our Slack channel. So I will send you the invite so you can share it with the audience, maybe. Yes, I will put the link in the show notes. So you will have the link to the show notes.

Cool. That will be great because there is a very friendly community in this Slack channel and you can ask any question and maybe if this learning curve is steeper for you than these three hours.

then you will get a lot of help in this Slack channel. And also remember that there is this backbunty program that we are kind of... promoting because I think it's a great win-win situation where you contribute queries and you help a lot of people because those queries are going to be run through code scanning in thousands of thousands of open source.

reboot people that are just developing their own applications projects on their own that they don't have like a security team behind them so by providing them with this security knowledge in the form of a code ql query you are helping a lot of people. And this is really fantastic. Is there any cool future CodeQL features that are currently coming out that you're excited about? Probably the one that I'm more excited about.

are still not public in the roadmaps, so I cannot comment on them. There are exciting things in CodeQL world that are coming hopefully very soon. that will make analysis and modeling libraries and code much easier for people, for developers especially. That's awesome. And how do all our listeners keep up with what you're doing? Do you have social media? How is the best way to talk to you on the internet? If you want to reach me in social media, my handle in Twitter is PondTester, so PWNTester.

and i'm not very active on twitter but i read every day so even though i'm not posting a lot of stuff there um just from time to time and you can reach me send me a message there and i will read it and yeah that's basic the social media than the social network that i use the most also you can reach me through this slack channel as well and if you want to follow the github security lab as well you can follow gh security lab in github

or visit our website in securitylab.github.com. Awesome. We will have all that linked in the show notes. Perfect. Thank you so much for taking the time with me today and thank you for letting us know about CodeQL. I'm really interested. I'm gonna go and write some queries. And yes, thank you for coming on HackCode. No problem, it was a pleasure. To all our listeners, thank you for listening to this episode and I'll see you in the future.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.
For the best experience, listen in Metacast app for iOS or Android