Will My AI-Generated Code Ship A CVE?
The canonical evidence on AI-generated code vulnerability rates — papers, industry reports, scanners, and benchmark repos every builder shipping AI-augmented code should know.
The AI-CVE Supply Rate Is The Load-Bearing Number
BrassCoders treats the AI-generated-code CVE rate as the central evidence for the entire category. The literature converges on a clear answer: yes, at a structural rate. The resources below are the canonical evidence — every entry is something a builder citing AI-code risk to a CTO, an AppSec lead, or a SOC 2 auditor can lean on without further vetting.
Each entry follows the same shape: type tag, source link, BrassCoders capsule explaining the applied claim, and a closing line on where BrassCoders itself draws from the resource.
📊 Veracode — State of Software Security 2026
Veracode, 2026 · veracode.com/state-of-software-security
BrassCoders treats this as the canonical evidence for the AI-CVE supply rate. The headline finding — 45% of AI-generated code samples introduce at least one OWASP Top 10 vulnerability on first generation — is the number to cite when someone asks "is AI code actually unsafe?" Builders evaluating whether their team needs a deterministic detection layer underneath their AI assistant should open with this number.
What it's good for: establishing the supply-side rate in stakeholder conversations. Where BrassCoders draws from it: the opening claim of the AI Blind Spots pillar and the lead proof point in MESSAGING.md §8.
📊 Cloud Security Alliance — AI-Generated Code Vulnerability Surge 2026
Cloud Security Alliance Labs, 2026 · labs.cloudsecurityalliance.org
BrassCoders treats this as the canonical evidence for the AI-CVE growth curve. The 2.74× year-over-year increase and the intra-quarter jump from 6 AI-attributed CVEs in January 2026 to 35 in March 2026 are the trajectory numbers. Builders sizing the urgency of their detection-layer investment should anchor on these figures.
What it's good for: demonstrating the curve is accelerating, not flattening. Where BrassCoders draws from it: the central thesis of the Q1 2026 AI-Code CVE Reckoning post.
📄 ACM TOSEM 2026 — Evaluating GitHub Copilot Review
ACM Transactions on Software Engineering and Methodology, 2026 · dl.acm.org/journal/tosem
BrassCoders treats this as the canonical evidence that LLM-based PR review systematically misses critical vulnerabilities. The paper documents Copilot review "frequently fails to detect critical vulnerabilities including SQL injection, cross-site scripting, and insecure deserialization" — concentrated in multi-file taint flows. Builders relying on Copilot's PR review as their only review layer should treat this as the reason to add deterministic detection underneath.
What it's good for: ending the "but Copilot reviews PRs for us" objection. Where BrassCoders draws from it: Blind Spot 1 (cross-file taint) in the pillar and the lead argument in the cross-file bugs post.
📄 OWASP Top 10 for LLM Applications
OWASP, 2024-2025 · owasp.org
BrassCoders treats this as the canonical taxonomy for LLM-specific risks (prompt injection, training data poisoning, model denial of service). The list is distinct from the classic OWASP Top 10 for application security; both apply to AI-augmented codebases for different reasons. Builders working on systems that embed LLM calls should cross-reference both.
What it's good for: framing AI-specific risks the application-layer OWASP Top 10 misses. Where BrassCoders draws from it: referenced in policy and compliance content; complements the application-layer OWASP categories BrassCoders detects.
🔧 Bandit
PyCQA · Python · widely-used · github.com/PyCQA/bandit
BrassCoders bundles Bandit as the primary Python security linter inside the OSS core. Builders running Python who want a single-tool starting point for security pattern detection should install Bandit directly; builders who want Bandit plus eleven other scanners with unified output should run BrassCoders. The tool is canonical for its niche and has been maintained by PyCQA for years.
What it's good for: Python-specific security patterns (hardcoded SQL, weak crypto, subprocess shell injection, unsafe deserialization). Where BrassCoders draws from it: direct integration as one of the 12 bundled scanners.
🔧 Semgrep
Semgrep · multi-language · widely-used · semgrep.dev
BrassCoders bundles Semgrep as the multi-language pattern-matching engine in the OSS core. Builders who need cross-language SAST coverage (Python, JavaScript, TypeScript, Go, Ruby, Java, and more) and want to write custom rules in a YAML-based pattern language should learn Semgrep directly. The OSS rules registry is extensive; the commercial product layers ranking and triage on top.
What it's good for: multi-language pattern matching with a custom rule language; AI-pattern detection rules. Where BrassCoders draws from it: the JavaScript/TypeScript and multi-language pattern coverage in BrassCoders runs on Semgrep with curated rule sets.
🔧 detect-secrets
Yelp · multi-language · widely-used · github.com/Yelp/detect-secrets
BrassCoders bundles detect-secrets as the entropy-based secret detector in the OSS core, layered with seven custom format patterns BrassCoders adds on top. Builders who want a single-purpose secret scanner with a baseline file for false-positive management should run detect-secrets directly. The entropy engine catches credential formats the pattern list does not.
What it's good for: entropy + pattern secret scanning, baseline-driven CI integration. Where BrassCoders draws from it: the secret-detection layer; foundation for the Secrets Your AI Might Leak post.
🧪 OWASP PyGoat
OWASP · Python · benchmark · github.com/adeyosemanputra/pygoat
BrassCoders treats PyGoat as the reference Python codebase for testing static-analysis coverage. The repo is intentionally vulnerable across the OWASP Top 10 categories, organized so builders can step through each vulnerability class. BrassCoders runs its own benchmarks against PyGoat among nine open-source codebases; the scan output is reproducible and tied to a pinned commit.
What it's good for: validating Python SAST tools against a known ground truth. Where BrassCoders draws from it: one of the nine codebases in the BrassCoders benchmarks.
🧪 OWASP NodeGoat
OWASP · Node.js / JavaScript · benchmark · github.com/OWASP/NodeGoat
BrassCoders treats NodeGoat as the reference Node.js codebase for testing JavaScript SAST coverage. The same OWASP-organized intentional-vulnerability shape as PyGoat, in a Node.js stack. Builders shipping Express, Fastify, or NestJS services should use NodeGoat to sanity-check whatever SAST layer they have in place.
What it's good for: validating JavaScript SAST tools against ground truth. Where BrassCoders draws from it: reference codebase in BrassCoders benchmarks; tests the Semgrep-based JavaScript layer.
🧪 Snyk Goof
Snyk · Node.js · benchmark · github.com/snyk-labs/nodejs-goof
BrassCoders treats Goof as a secondary Node.js benchmark with a different vulnerability distribution from NodeGoat. The repo is maintained by Snyk Labs and tracks newer vulnerability classes (dependency confusion, prototype pollution, recent-CVE patterns). Builders comparing JavaScript SAST tools should run both NodeGoat and Goof.
What it's good for: testing newer JavaScript vulnerability classes; dependency-related issues. Where BrassCoders draws from it: reference codebase in BrassCoders benchmarks.
Frequently Asked Questions
How often does AI-generated code introduce vulnerabilities?
Veracode's State of Software Security 2026 found 45% of AI-generated code samples introduce at least one OWASP Top 10 vulnerability on first generation. That is the supply-side rate; every commit of unfiltered AI-generated code is a 45% chance of a Top 10 issue entering the codebase.
Is the AI-CVE rate actually growing?
Yes. The Cloud Security Alliance documented a 2.74× year-over-year increase in CVEs attributed to AI-generated code across Q1 2026, with the monthly count jumping from 6 AI-attributed CVEs in January to 35 in March. The intra-quarter slope is steeper than the year-over-year figure.
Can LLM-based code review catch these vulnerabilities?
Inconsistently. ACM TOSEM 2026 evaluated Copilot review against realistic multi-file codebases and reported it "frequently fails to detect critical vulnerabilities including SQL injection, cross-site scripting, and insecure deserialization." LLM-based PR review is useful but cannot be the only detection layer for security-critical bugs.
What deterministic tools should I run against AI-generated code?
Bandit for Python security patterns, Semgrep for multi-language pattern matching, detect-secrets for credentials, plus an interprocedural taint engine (Pyre/Pysa for Python; CodeQL for multi-language) for cross-file bugs. BrassCoders bundles all of these into one CLI; running them individually requires 12+ separate config files.
How do I test my detection layer against ground truth?
Use intentionally-vulnerable benchmark codebases. OWASP PyGoat for Python, OWASP NodeGoat for Node.js, Snyk Goof for newer JavaScript vulnerability classes. BrassCoders runs its own reproducible benchmarks against nine such codebases at coppersun.dev/benchmarks.