What we want to talk about is the convergence of AI and cyber .
As you know , given the increasing threat surfaces that we're seeing , whether it's IoT , whether it's a variety of things in our infrastructure , the advances in the digital transformation that we're all talking about , much increased threat surface and with this increased threat surface , the question is what do we do about it ?
One of the tools that we're thinking about using for that is AI . What we'd like to do during this discussion is have that conversation . What can we do about that ?
So the first thing we're going to do I think you'll enjoy the panel we have , but the first thing we're going to do is have our panel just say a little bit about what they do , and then we'll get into some questions about kind of the current state of the art .
We're going to talk about research , we're going to talk about workforce and some other interesting issues , all right , so why don't we get started ? Well , hi everyone , I'm Megan Good and for 21 years I've been at Leidos leading cyber technology development .
My current role is actually leading our internal research and development around cyber , AI and software the confluence of those three areas and then actually working with external technology partners and figuring out how do we match what we're investing in , what those partners have been working on , into solutions that matter for our customers across defense , the intelligence
community , health and even commercial critical infrastructure .
Thank you , hi , good afternoon again . I'm Donald Coulter and for two years I've been at Department of Homeland Security Science and Technology Directorate .
Before that , I worked for the Army and came from Army Futures Command where we helped stand up the network cross-functional team , and so I was advised over all of the C5I R&D for the Army for the last few years before coming here , where I'm the Senior Science Advisor for Cybersecurity year , where I'm the Senior Science Advisor for Cybersecurity .
So what that just means I oversee all of our R&D and make sure that it's aligned towards our strategic priorities and , working with our partners , both within and across all of our Homeland Security enterprise , as well as with industry and academia and all of our interagency partners as well , to make sure that we are pushing a cybersecurity research agenda that is
fruitful and helps inform the future resilient state of our Homeland Security enterprise .
Well , good afternoon everybody . My name is David Carroll . I lead the mission engineering portion of the Cybersecurity Directorate for the Cyber and Infrastructure Security Agency , or CISA .
So to kind of key in on my DHS S&T colleague here I make material what they make possible and our focus is very , very quickly changing to what this panel is going to talk about my background actually it's strange Ten years at DHS as a MITRE guy I'm a former DHS chief security architect .
Came back after 10 years , after Microsoft , google and some work in the Navy Reserve over in bad places , and decided I wanted to get back onto the mission full time . I lead a team of a little over 200 feds and about 1,000 contractors and we produce little known things like Einstein .
The National Cyber Protection System , the Cyber Analytic Data System and the Joint Collaborative Environment is our new portfolio . So all the things and that is quickly , as you'll hear me speak going quickly from a NetFlow traditional type of analytic environment where you're sitting there on laptops trying to find you , to find the ghost in the machine .
The machine is finding the ghost in the machine and we are trying to find relevance for it . Thank you , guys . So let's just start baselining . Where do you think the challenges with cyber are right now that lend themselves to mitigation by AI ? Well , I'll go . Great question .
AI . It is and anybody who's seen me talk before it is about collaboration . We need the data . We need to be able to cross that data over , bring it into our national security enterprise and give that novel information that we are empowered to give . So , from my perspective , we could talk program . We could talk program . We could talk funding .
We could talk manpower . All those are important , but it all starts and ends for us with data .
So to build on that too . I think AI is especially important as a tool around cyber use cases From a defensive perspective .
I think there's so much to comprehend and so many small pieces coming from lots of different areas that you need more automation and how we're automating out toil across a workforce that really is already burnt out and that we're trying to continually upskill over time . They need a partner , they need an assistant .
It is looking more and more that that could come from AI , and particularly generative AI is an exciting opportunity there . But I think the other challenge of that is we're creating a new attack surface and then we're adding that more in and that it is about the data .
But the challenge is the data too , and about what you can trust , what you can action and then how fast you can do all of that .
Yeah , I think you alluded to it . You talked about what you can trust . I think One of the key things besides the technology is that human that is still going to be a part of this . So figuring out how do humans interact with these systems ?
How does having systems that are employing AI affect the way that humans go about participating in the process of defending their systems and making intelligent decisions ? How do we assign responsibility for the results of those decisions that are being made between humans and AI teams as well ? It's going to be critical .
So thanks , guys , we've got good setting the stage there , so I want to start with data . So what are your thoughts on how we deal with data ? As you know , it typically takes 80% of your time to condition the data for these systems . What are your thoughts on dealing with that ?
As well as I know , you talk about bias and these kinds of things , let's talk about data , how we can do a better job of working with that data , managing the data , so that we can actually get to the things we need to do as an inference quicker .
Well , I'm trying to influence my teams . We are taking very seriously to the point where my traditional InfoSec team is already pivoting to ML , aiops and SecOps particularly , which is a hard enough thing to go from information assurance , fisma , fedramp , that type of thing to .
I need to ensure the data efficacy , the lineage of the data , all the things that to the point where we're having to teach data management and body of knowledge and things that are really kind of outside the traditional . I mean , they are controls , but they're a different type of controls and I think that's where we start .
I think we also this venue has talked about zero trust . We have to assume breach in the data , we have to assume that it comes with negative efficacy and we have to tease that out . But I would also say we also have to be careful .
There's a fine line there between teasing out what we think is compromised data and what we are doing to generate bias in the cyber environment , because , frankly , our job most of the time is to look for that anomaly and so if we tune it too much we might miss the anomaly .
And I would say I think one of the challenges around data is where it is , what's important , when it's important and for how long , and how to reduce the storage bill of it all . I'm sure others will talk about zero trust later , but I think one of the best gifts of adopting a zero trust philosophy has been gathering more data to make decisions on .
And then the downfall is gathering more data to make decisions on , because you have to store it , you have to keep track of it , you have to know what the fields are , you have to know all these ins and outs and what could be right , what could be wrong .
And then , when you add AI on top of that , it's nice to have labels , it's nice to have things to learn over time . Bias is something to look at , but it really helps when you have a fundamental model of what it actually is and what it isn't , and I think that's one of the big struggles as we're working with customers and programs . It's where is the data ?
What is some known knowns in the data ? What are the things that you're experiencing , that it's new data over time , and how do you manage and get your arms around it . So I actually feel like that's the only growing problem .
Yeah , I think this interesting challenge and opportunity with data is that we're collecting so much of it , but we have to be very judicious with how we use it and share it .
When I look at it from an R&D perspective , not only we're collecting a lot of operational data , but we have to be able to figure out how can we take some of this and share some of it , or anonymize some of it , or what can we do to it that we can use it and share with some of our research partners as well , so that they can not only noodle on it
to solve today's problems , but noodle on it to identify those risks of the future . So that's a huge challenge for us .
You know , megan , you said this quickly , but let's take a little more time . What about data sharing ? Because I do know , having been at DHS , that's a challenge , right ? So what are some of your thoughts on data sharing , how well we can do a better job of that ?
Oh man , it's because .
I said something pippy backstage .
It's your fault , all right . Well , I actually think data sharing is just hard , which we could probably all agree on . It's choose your heart there , but I think it's what's important to different people at different times .
The challenge we have is we like to package up data to be shared and we like it in PDFs , we like it to be well understood , we want it to be perfect , and that's actually not . When it's useful , it's useful in its raw form , it's useful as things to share around , but how do we share something that's interesting and not reshare the whole internet ?
So it's again that balance and that packaging . But I do think there's a lot that we could do better and would be very intrigued at shared data sets that we have . And I'm very excited about some of the capabilities that we're coming out with in the different programs to actually share data so that we can come up with interesting models on top of it .
And I know there have been a few along these public-private partnerships over time and we'd be curious your thoughts actually on how those are all going . Not to put you on the spot too .
I want to put him on the spot too .
I think that part of what's going to allow us to do a better job of sharing data is continue to expand on our usage of privacy-preserving technology and privacy-enhancing technologies and really enabling people to label and identify the provenance of their data and the ownership of their data and be able to tag who or what types of people and or what types of
systems should be able to access that and being able to encrypt that and protect that in a way that we do have more confidence that we can share this data which is sensitive to us as a whole .
In the aggregate , we can share a portion of this with a partner that shares a mission but doesn't necessarily share a need to have direct access to the raw data as well .
Is this where I'm on the spot . All right , let's do it . A couple things I learned a lot . I left in 2012, . Came back 10 years later . I learned a lot working in the financial services industry about obfuscation and devaluation of data in $100 billion banks . It's a big deal .
I think we as a federal society can learn from that and work on it To the S&T point . I'm a big proponent . I am actually a signer of the homomorphic encryption initial standard at MIT in 2016 , I think , if I remember right , it was at Microsoft at the time .
So for those that don't know what that is , that's actually doing calculations and operations and analysis on data while it is encrypted . It is a very interesting technology .
We have various things we're working with S&T oblique queries , things that are going to get us there without having to incur the overhead of too much governance , because , as we both said , it can become cumbersome and it can become hard to manage . It can become too expensive .
And then the last thing I'll say is we also kind of always have to think about do we need the data ? There are some of us who do , especially in cyber , and I'll use that . Do we need the data ? There are some of us who do , especially in cyber , and I'll use that . There are some that need it for prosecution or for Like .
I don't necessarily need all the data and keep it . I need the signal from the data , I need to see the points of reference that go through it , I need to see the pattern and if you combine that with the AI side , that is really what you're doing right . You're going there . You're not reading all the data with AI .
You're trying to find mechanically that area under the curve and trying to get that efficacy down to where you know that you've got the pattern right . So I think all those things , combined with those advanced technologies coming out , are going to get us there .
But it's something we all have to start agreeing on and kind of rallying around and not snap back into going too far into the let's tag and label everything and just hope for the best . It's just not going to work .
It was interesting . Can you guys hear me if I sit back this far ? Okay , sorry , better closer , better closer .
I'm trying to be comfortable .
Yeah , what was interesting is , dave , you brought up homomorphic encryption , but , dc , you also mentioned privacy-enhancing , privacy-enabling technologies . Right , thank you for the definition of homomorphic encryption , but , dc , could you explain some of the work S&T is doing in this privacy-enhancing work ?
Yeah , so homomorphic encryption is just one part of our privacy-enhancing research , but we're also looking into different approaches to differential privacy and as a part of a federated learning opportunity . And that is a difficult challenge and lots of people have been working on it .
But we do think that there have been advances in computing , there's advances in algorithms , there's advances in the way that we capture and store data , and what these things allow us to do is to do things like protected queries , where you can query on a system without actually telling the system owner what you asked for and why you needed it and what the answer
was necessarily . So these are the types of things we're trying to tease out able to share their data or as portions of their data , with confidence that only the parts that they want people to have access to they can get .
Great . So let's go back again . Let's go back to kind of first principles . So let's say , you have the data right . What do we think we can accomplish with this ? Like , why is this going to make cybersecurity better in some sense ? I'm just who wants to answer the question . Oh , I know .
I'll go , I tell a story . Listen . My director asked me at some point you know what's good ? And I said well , here's what's interesting . I said , you know ? I said basically I've been space and time . And she said what does that mean ? And am I going to take your clearance ?
And I went no , no , I said we took a 44-hour analytic and we watched as they were processing it and they were doing all the right data things , everything we've been talking about . But it took 44 hours and I said well , what's the commonality here ? Well , the commonality was , in that case they weren't computer scientists , they were threat analysts .
And so they were sitting there with their hoodie and their Red Bull and they were looking at that thing and on that eight processor , mac , and I said when got my data science ? She said what happens if we put 7,500 processors in a containerized environment ? Parallel got the same analytic done in 16 minutes . I said so in cyber machine . I say it in in .
You know , right now , pretty much , take a lot of our , our , our regular analytics that aren't too gruesome and we're , we're at about a four hour clip to turn around a novel analytic on it . Um , in , I keep telling folks , in a year , it'll be . It'll be one . It'll be one day in two to three years , it'll be .
It'll have to be instantaneous because we're fighting against the same thing that we're trying to fix it with . So as the algorithms get faster and the polymorphism of the malware gets faster , we're going to have to meet it machine on machine at some point . It's essential . We don't have a choice .
So thanks . So , DC , I'd like to go to you because in the green room we talked a little about human machine teaming , right ? So let's pick up from what David said . So what are some of your thoughts on human machine teaming ? People talk about co-pilots , all this kind of stuff . What are some of your thoughts ? What are some of the resources going on ?
Okay .
So one of the things we're doing in human machine teaming is we set up a new action AI institute with the NSF looking at developing intelligent cyber agents , and so we're building a new AI stack that looks at first it looks at strategic learning and reasoning for AI agents , then looks at building in human and AI agent collaboration , agent to agent , multi-agent
collaboration and strategic game theory , right , but then we intersect that across the cyber lifecycle . But that intersection of humans and AI is where we think is the key differentiator and problem space that's going to make a difference on how we use these systems .
So one of the things we're doing is working with our social science group to get a better understanding . How do you even measure the efficacy of a cyber operations team ? And how do you measure , when you have different players in a team , whether it's just all humans , whether it's all digital agents or it's a mixture of the two ?
How do you measure the efficacy of the different players in that game ? And then how do you identify opportunities to improve the effectiveness of those folks ?
So , as we develop those measures and understanding of what tweaks and changes we can make to the way we team and how much information we're sharing with the human and how much of the decision-making we're offloading on the machines .
That's an area that we're going to continue to shape and release the literature and understanding and build into the systems that we're not only deploying across DHS and with CISA but put an opening for the public to implement as well .
Thanks . So , megan , you know I'm gonna turn to you now right ? Yeah so let's talk about the AI accelerator . And as you talk about the AI accelerator , let's talk about trust as well .
Well , so it's the cyber accelerator right Cyber , my bad . But , as I said , cyber and AI , they go hand in hand .
As we've already been talking about of how do we change from where we are in a rule-based state or writing analytics , basedologists , data scientists and the like and how do we develop interesting technology that is going to change those systems fundamentally and disrupt the way that we've been doing work .
Acceleration for us , is about how do we move at speed to get new tech in to solve the challenges that we have today .
But with one area that we're really excited about , to move on from this rule-based cyber point is that we trust those rules because we wrote them and because it's based on our knowledge and based on what we saw and what we embedded back in . And there's this whole . It has to be happening that machines are already coming after all of our environments .
The proof is there , it's evident , right . The incidents keep happening and they're getting past the rules , just like teenagers get past your rules , right , and you know they're very evasive , right .
So what we're learning is how to use AI to be evasive teenagers in our systems and to actually evade the rule sets that we put in place and then detect that evasion on the other side and find a way that we can then deploy that with existing systems , and so that's kind of what we've classed as adversarial AI .
But it's hard to get to trust when it's not something that you wrote , when it's not something that you know how it's going to work , when it is something that's evolving and learning over time , and so there's a lot involved there that we've been looking at over the years of how do we measure trust , how do we measure explainability , how do we measure drift in
models ? How are we starting to look at what the outcomes are ? And I love where you're going with the metrics of what is effective , because that's going to start to prove it . That just adds some more of that trust .
So I'm not going to let you go on that one . Let's keep going with that . What are you finding ? Instills trust .
So we have an approach that we use called design to trust . Um , yeah , very clever , right , but um , I think , as a name though it's , it's a lot about what are the foundations of trust ? Um , and that is about people seeing the technology as it develops over time , as it applies in their environment , even more .
Um , we have the a whole method of where you start just with analysis of data and you say what would a model say about this data ?
Then Then about how do you assist them , what are some recommendations , how do you turn it up to where you're augmenting them and taking some of the work off their shoulders before you're just pressing the autonomous operation button ? And so how do you keep dialing different models up ?
That takes away that toil , that automates it away , and so we're finding that through that process , you start to know well , this is giving me really wonky results . So then , what do we have to tune ? We are finding right now , because things aren't labeled to your point . We just have a lot of anomalies , right .
And so how do we start tuning to anomalies in real-world data ? What else do we need to direct to when something isn't working as expected ? And I think that builds trust too , of you're not just blindly saying this is the right way , it's . How do you keep ?