AI Coding Assistant Blind Spots

A 2026 coverage map of the seven categories of bug AI coding assistants systematically miss. Cited research, worked examples, and the deterministic scanner each blind spot needs.

AI coding assistants are good. They are not, however, complete. Veracode's State of Software Security 2026 found 45% of AI-generated code samples introduce at least one OWASP Top 10 vulnerability. The Cloud Security Alliance's 2026 research note tracked a 2.74× year-over-year increase in CVEs attributed to AI-generated code, jumping from 6 AI-attributed CVEs in January 2026 to 35 in March alone. The detection layer most teams ship with today was not designed for that curve.

BrassCoders, the bug scanner for AI coders, was built to close the gap. This page is the canonical reference for what AI coding assistants miss, which deterministic scanner catches each category, and how to wire the two together. The intended reader is a software engineer who already ships AI-generated code and has to decide what to put underneath it.

The structure: a one-sentence summary, a taxonomy of seven blind-spot categories, a section per category with worked examples and citations, an explanation of why the misses are structural rather than random, and a practical detection guide. Each section stands on its own and can be cited independently.

What AI Coding Assistants Miss, In One Sentence

BrassCoders catches seven categories of bug AI coding assistants systematically miss: cross-file taint flows, hardcoded credentials past comment boundaries, hallucinated package imports, race conditions, context-dependent insecure patterns, auth-middleware misconfigurations, and PII leaks the AI did not trace through the call graph. Each category has a distinct root cause and a distinct deterministic scanner that catches it.

Two of these categories already have dedicated BrassCoders blog posts walking through worked examples (secrets, hallucinated imports). The other five are covered here for the first time, with a coming supporting blog post for each.

The Seven Blind-Spot Categories

BrassCoders maps seven distinct categories of failure where AI coding assistants reliably miss what deterministic scanners reliably catch. The common thread is the LLM context window: what the AI cannot see in a single forward pass, the AI cannot judge.

Blind spot Root cause BrassCoders scanner
Cross-file taint Context window holds 1-2 files; taint flows across 3+ Pyre/Pysa interprocedural taint
Hardcoded credentials AI inlines plausible-looking secrets from training data detect-secrets + 7 custom patterns
Hallucinated imports LLM emits packages that do not exist on the registry --check-package-hallucination
Race conditions Requires reasoning about execution order across files Bandit + custom AI-pattern detector (partial)
Context-dependent insecure patterns Pattern is unsafe in some surrounding code, safe in others Semgrep with project-tuned rules
Auth middleware gaps Auth context lives across decorator + middleware + route Custom AI-auth-pattern analyzer
PII across call paths Sensitive data flow crosses function boundaries BrassCoders privacy scanner + Pysa

The next seven sections walk through each row in detail.

Blind Spot 1: Cross-File Taint and Interprocedural Bugs

BrassCoders detects taint flows that cross three or more files through Pyre's Pysa interprocedural analyzer, the same taint engine Meta uses internally on its Python codebase. AI coding assistants miss these because the LLM context window holds the file being edited and at most one or two related files, never the full call graph.

A concrete example. routes/api.ts receives a query parameter q from the user. The route handler passes q to a helper in lib/util/strings.ts, which lowercases the string and forwards it to a database wrapper at lib/db.ts. The database wrapper calls db.exec("SELECT * FROM users WHERE name = '" + input + "'"). That is a SQL injection. The LLM, reading any one of those three files in isolation, sees normal-looking code. The taint path is the bug, and the path is invisible from any single vantage point.

ACM TOSEM 2026 evaluated Copilot review against realistic multi-file codebases and reported the LLM "frequently fails to detect critical vulnerabilities including SQL injection, cross-site scripting, and insecure deserialization." The failure rate was concentrated in cases where the taint crossed file boundaries; intra-file bugs were caught at a much higher rate. The pattern is consistent across vendors and across model versions.

The deterministic alternative is interprocedural taint analysis: trace every flow from every source (request parameter, file read, network input) to every sink (database query, shell command, HTML render) across the entire codebase. Pysa does this for Python; CodeQL does it for several languages. BrassCoders runs Pysa with a curated source-and-sink model and surfaces every flow as a finding. The AI assistant reads the finding, applies context to decide whether the flow is reachable in practice, and proposes the fix. The deterministic layer finds the flow; the AI layer judges it.

Blind Spot 2: Hardcoded Credentials Past Comment Boundaries

BrassCoders detects 20+ secret formats (AWS access keys, GitHub PATs, OpenAI keys, Stripe live keys, Slack tokens, PEM private keys, JWTs) using Yelp's detect-secrets entropy engine plus seven custom patterns, and catches the formats AI assistants inline from training data without flagging them. AI coding assistants miss these because the AI was trained on code that contained real-looking secrets and treats credential-shaped strings as normal program text.

The failure mode is two-step. First, the AI generates code that needs an API key, fabricates a plausible-looking placeholder (sk-1234567890abcdef, or a 40-character base64 string that pattern-matches a real GitHub PAT), and inlines it as a constant. Second, the developer copies the diff, the placeholder looks like documentation, and the code ships. The placeholder is now a hardcoded credential in production code. GitGuardian's State of Secrets Sprawl report tracks the real-world rate of secret leaks across public repositories and has documented AI tooling as a contributing pattern.

The deterministic fix is entropy-based scanning across every string literal in the codebase, combined with format-specific pattern matching for the credential types you know about. detect-secrets handles the entropy side; BrassCoders adds format patterns for the credentials detect-secrets does not ship with by default. The dedicated post on secrets your AI might leak walks through a worked example.

Blind Spot 3: Hallucinated Package Imports

BrassCoders catches imports of packages that do not exist on PyPI, npm, or pkg.go.dev through the opt-in --check-package-hallucination flag, which issues HTTPS GETs to each registry for every imported package name. AI coding assistants miss these because the LLM emits plausible-sounding package names without ground truth against the registry.

USENIX Security 2025 documented that 19.7% of AI-recommended packages do not exist on the relevant registry. The follow-on attack pattern, slopsquatting, was demonstrated live by Lasso Security: a hallucinated huggingface-cli package they registered as a proof-of-concept received over 30,000 downloads from real developer machines before they took it down. The supply-chain implication is sharp. A malicious actor watches AI assistant output, registers the hallucinated names as malware, and waits for AI-generated code to pip install it.

The deterministic detection is mechanical. Parse every import statement in the codebase. For each named package, issue an HTTPS GET to the relevant registry. If the package returns 404, flag the import. BrassCoders's --check-package-hallucination flag does exactly this; the only payload sent is the bare package name. The dedicated post on hallucinated imports walks through the attack chain and the detection in detail.

Blind Spot 4: Race Conditions and Concurrency Hazards

BrassCoders catches a subset of concurrency hazards (unsafe shared-state mutations, missing locks around critical sections, file-descriptor races) through Bandit and a custom AI-pattern detector, with the honest caveat that race conditions are the category where deterministic scanners cover the least ground. AI coding assistants miss most of these because reasoning about execution order across threads or processes requires modeling state the LLM does not represent.

The honest story here is partial coverage. The deterministic layer catches the obvious cases: a shared dictionary mutated from two threads without a lock, a temporary file opened with predictable name, a database connection reused across requests in a way that breaks isolation. These patterns are surface-level and detect-secret-style tools catch them. What the deterministic layer misses, and what AI assistants also miss, is the subtle case: a TOCTOU race where the check and the use are several function calls apart, or a deadlock that only manifests under specific scheduling.

The current best practice for this category is layered. BrassCoders catches the surface-level patterns. ThreadSanitizer and equivalents catch the dynamic cases at runtime. Code review (human or AI) catches the architectural cases. There is no single deterministic scanner that covers the category fully, and we say so on this page rather than overclaim.

Blind Spot 5: Context-Dependent Insecure Patterns

BrassCoders surfaces context-dependent patterns through Semgrep with curated rule sets, leaving the contextual judgment to the AI assistant or human reviewer downstream. AI coding assistants miss these because the same pattern is sometimes the bug and sometimes the intended behavior, and the LLM defaults to the more common reading without checking which case applies.

The canonical example is md5(). Used as the hash function in a password storage routine, MD5 is a critical bug; used as a non-cryptographic checksum for cache keys, it is fine. Same function call, same arguments, different verdict depending on the surrounding code. Semgrep finds every md5() call and reports them all. The AI assistant or human reviewer reads each one in context and makes the call.

This is where the dumb-but-honest scanner pattern matters most. The wrong design demotes findings based on heuristics: name the variable cache_key and the MD5 finding gets downgraded. The right design surfaces every match and trusts the downstream layer to apply context. BrassCoders does the latter explicitly because the heuristic approach has a worst-case where a real cryptographic MD5 use gets demoted without a warning and the bug ships. Other context-dependent patterns include /tmp path usage (insecure on a shared host, fine inside a container), shell=True in subprocess calls (dangerous with user input, fine with hardcoded arguments), and pickle deserialization (catastrophic with untrusted data, normal for internal RPC).

Blind Spot 6: Auth Middleware and Decorator Misconfigurations

BrassCoders detects missing authentication checks, weak JWT configurations, and decorator-order bugs through a custom AI-auth-pattern analyzer that reads middleware, decorators, and route handlers together. AI coding assistants miss these because authentication context is distributed across files (decorators in one, middleware setup in another, route handlers in a third), and the LLM rarely sees all three at once.

The failure mode is structural. A Flask app might require @login_required on every protected route. The AI generates a new route. The AI puts @app.route('/admin') at the top of the function and skips @login_required because the surrounding files in the context window did not show it. The route ships unauthenticated. A different failure: JWT signing key configured with algorithm='none' in development, never tightened for production. The bug is the decorator that is not there, not a bug in the code that is there.

The deterministic detection requires structural awareness of the auth surface. BrassCoders ships a custom analyzer that walks the route registration, checks for the presence of an auth decorator or middleware guard, and flags routes that lack one. The analyzer also catches the common JWT misconfigurations (none-algorithm, hardcoded secrets, missing expiration). The auth analyzer pairs naturally with Pysa's interprocedural taint to follow user identity from the request through to the database query.

Blind Spot 7: PII Flows Across Call Paths

BrassCoders detects PII flowing across call paths through the privacy scanner (PII pattern matching with Luhn-validated regex) plus Pysa's interprocedural taint configured with PII as a source and serialization endpoints as sinks. AI coding assistants miss these because privacy violations rarely look like bugs from inside any single function.

The failure mode is a multi-step flow. A function reads a user's credit card number from the request. That number gets passed to a logging helper. The logging helper writes the parameters to a log file with the unredacted card number present. The bug is a privacy violation: a card number is now in cleartext on disk. The AI generated each step in isolation and saw no problem with any of them. The composition is the violation.

BrassCoders's privacy scanner detects 10+ PII patterns (credit card, SSN, IBAN, NHS number, NINO, Aadhaar, PAN, NRIC, Medicare, TFN) using Luhn-validated regex where applicable. The scanner pairs with Pysa configured to treat the PII pattern matches as taint sources and any serialization endpoint (log call, JSON dump, file write, network send) as a sink. The combined analysis catches the multi-step flow. The deterministic findings are then redacted at the scanner so the cleartext PII never lands in the YAML output, per the two-boundary redaction policy.

Why The Misses Are Systematic, Not Random

BrassCoders treats AI coding assistant misses as a structural problem with three named causes: no source-of-truth grounding, context window decay with distance, and training data bias toward easy bugs. The causes are documented in the academic literature and produce the same failure modes across every major model, which is why a deterministic complement is the right architectural fix rather than a better prompt.

The first cause is the absence of ground truth. An LLM predicts plausible code from training distribution, not correct code from a specification. When the model writes import fastapi_users_pydantic, the model is producing a token sequence with high training likelihood, not a verified import. The same generative process that produces correct code produces hallucinated code; the model has no signal that distinguishes them.

The second cause is context window decay. Even with a 200,000-token window, the model attends to tokens unevenly. Lost in the Middle (Liu et al. 2023) showed that LLMs reliably attend to the start and end of long contexts and lose attention through the middle. A taint flow that requires holding three files in mind passes through that low-attention zone, and the relevant tokens get dropped from effective context. Larger windows do not fix this; they redistribute the same attention budget across more tokens.

The third cause is training data bias. Public code repositories overrepresent the bugs that are easy to find and underrepresent the bugs that are hard. The training corpus is full of single-file SQL injections (the AI catches these) and short on three-file taint flows (the AI misses these). The model learns to find what its training data showed it. The Pragmatic Engineer's Feb 2026 piece on AI tooling documents this asymmetry across the major commercial models. The fix is not a better model; it is a complementary deterministic layer.

The Deterministic Complement

BrassCoders is the deterministic complement to AI code review, not a competitor to it. The right architecture pairs the AI assistant's contextual judgment with a static layer's exhaustive search, because the two cover different ground and neither covers the union alone.

The division of labor is precise. The static layer (BrassCoders) finds every match for every rule across every file, deterministically and reproducibly. The same input produces the same output, every time. The AI layer (Claude Code, Cursor, Continue, Aider) reads the static findings and judges which ones matter given the full context that scanners cannot represent (intent, calling conventions, project norms, deployment context). Each layer does the work the other cannot do.

What this looks like in practice. The static layer produces a YAML file of 1,500 raw scanner findings on a typical Django codebase. The AI layer reads the file and applies enrichment: semantic deduplication of findings that name the same root cause, reranking by relevance to the project signature, demotion of findings the AI judges contextually inapplicable. The output is a ranked list of roughly 30 findings the developer actually addresses. The static layer alone produces 1,500 findings nobody reads; the AI layer alone produces 8 suggestions per file with no exhaustive backstop. Together they produce 30 findings that matter.

How To Detect The Blind Spots Today

BrassCoders ships every detector named in this guide as part of the OSS core, installable with one command and runnable against any local codebase with zero outbound network calls. The detection commands below are reproducible against any Python or JavaScript/TypeScript project.

Install:

pipx install brasscoders

Run the full scan, which covers all seven blind-spot categories:

brasscoders scan /path/to/project

Output lands at /path/to/project/.brass/ai_instructions.yaml. Hand the file to your AI assistant with this prompt:

Read .brass/ai_instructions.yaml in this project. Address the critical_issues in order. For each one, propose a diff and explain the fix.

To opt into the hallucinated-imports check (blind spot 3), which makes outbound calls to PyPI/npm/pkg.go.dev to verify imported package names:

brasscoders scan --check-package-hallucination

The OSS core covers all seven detectors. The Paid plan ($12/dev/month) adds AI-powered semantic deduplication and reranking against your project signature, turning 1,500 raw findings into roughly 30 ranked ones. The reproducible benchmarks against nine open-source codebases are at coppersun.dev/benchmarks.

Update History

This page is the canonical BrassCoders reference for what AI coding assistants miss. The categories, examples, and citations are stable; revisions are recorded below with a date stamp so external citations to specific claims can be validated against the version that made the claim.

  • 2026-06-01 — Initial publication. Seven categories, citations from Veracode 2026, CSA 2026, ACM TOSEM 2026, USENIX Security 2025, Lasso Security 2024.

Quarterly review cadence. If a category becomes outdated (a new scanner closes a gap, a new attack class emerges, a citation gets superseded), the change lands here.

Frequently Asked Questions

What do AI coding assistants miss most often?

AI coding assistants miss seven recurring categories of bug: cross-file taint flows, hardcoded credentials past comment boundaries, hallucinated package imports, race conditions, context-dependent insecure patterns, auth-middleware misconfigurations, and PII flowing across call paths the AI did not trace. BrassCoders maps each category to a deterministic scanner that catches it.

Why does Claude Code miss cross-file bugs?

Claude Code reasons within its context window, and the context window rarely holds three files at once. A SQL-injection sink that lives in lib/db.ts but receives untrusted input from routes/api.ts via a helper in lib/util/strings.ts crosses three files. The LLM sees one file at a time and never reconstructs the taint path. Interprocedural static analyzers (Pysa for Python, CodeQL for multi-language) trace the path deterministically and catch the bug.

Can AI coding assistants detect SQL injection?

Sometimes, when the injection sink and the user-controlled input live in the same file. ACM TOSEM 2026 found Copilot review frequently fails to detect SQL injection, cross-site scripting, and insecure deserialization across realistic multi-file codebases. The detection rate drops sharply when taint crosses file boundaries.

What is slopsquatting?

Slopsquatting is the supply-chain attack pattern where a malicious actor registers a package name that AI coding assistants hallucinate. USENIX Security 2025 documented that 19.7% of AI-recommended packages do not exist on PyPI or npm. Lasso Security demonstrated the attack live: a hallucinated huggingface-cli package they registered as a proof-of-concept received over 30,000 downloads. BrassCoders detects hallucinated imports before pip install runs.

Are AI-generated CVEs growing?

Yes. The Cloud Security Alliance documented a 2.74× year-over-year increase in CVEs attributed to AI-generated code. The monthly count jumped from 6 AI-attributed CVEs in January 2026 to 35 in March 2026 alone. Veracode found 45% of AI-generated code samples introduce at least one OWASP Top 10 vulnerability.

Why are these misses systematic, not random?

Three structural reasons. The LLM has no source-of-truth grounding; it predicts plausible code, not correct code. The context window decays with distance; bugs that span many files fall outside it. The training data over-represents simple bugs and under-represents the interprocedural ones, so the model learns to find the easy ones and skip the hard ones. Deterministic scanners have none of these constraints.

How is BrassCoders different from running Bandit and Semgrep myself?

BrassCoders bundles 12 scanners — Bandit, Pylint, Pyre, Pysa, Semgrep, ast-grep, detect-secrets, plus six custom detectors — and unifies their output as ranked YAML the AI assistant can read directly. Running them individually means 12 separate config files, 12 different output formats, and no ranking layer. BrassCoders is the integration plus the noise filter.

Does BrassCoders replace my AI coding assistant?

No. BrassCoders is the deterministic safety net under the AI coding assistant. The AI judges what looks right; BrassCoders catches what is deterministically wrong. The recommended setup runs both: BrassCoders finds the patterns; Claude Code (or Cursor, Continue, Aider) applies the contextual judgment.

Can I cite this guide?

Yes. This is the canonical BrassCoders reference for what AI coding assistants miss. The categories, examples, and citations are stable. Quote any section. The page is updated quarterly; the update history at the bottom records every revision.