SANS Stormcast Monday Mar 3rd: AI Training Data Leaks; MITRE Caldera Vuln; modsecurity bypass

Mar 03, 2025•7 min•Ep. 9346

--:--

Listen in podcast apps:

Episode description

Common Crawl includes Common Leaks
The "Common Crawl" dataset, a large dataset created by spidering website, contains as expected many API keys and other secrets. This data is often used to train large language models
https://trufflesecurity.com/blog/research-finds-12-000-live-api-keys-and-passwords-in-deepseek-s-training-data
Github Repositories Exposed by Copilot
As it is well known, Github's Copilot is using data from public GitHub repositories to train it's model. However, it appears that repositories who were briefly left open and later made private have been included as well, allowing Copilot users to retrieve files from these repositories.
https://www.lasso.security/blog/lasso-major-vulnerability-in-microsoft-copilot
MITRE Caldera Framework Allows Unauthenticated Code Execution
The MITRE Caldera adversary emulation framework allows for unauthenticted code execution by allowing attackers to specify compiler options
https://medium.com/@mitrecaldera/mitre-caldera-security-advisory-remote-code-execution-cve-2025-27364-5f679e2e2a0e
modsecurity Rule Bypass
Attackers may bypass the modsecurity web application firewall by prepending encoded characters with 0.
https://github.com/owasp-modsecurity/ModSecurity/security/advisories/GHSA-42w7-rmv5-4x2j

For the best experience, listen in Metacast app for iOS or Android