Python Code Scanning for AI-Generated Code

AI coding assistants generate Python that passes a visual review and fails in production. This guide maps the four tool categories that catch each failure class, gives honest catch-rate data for each, and shows how to run the full stack without the configuration cost.

The standard Python scanning stack — Bandit, Pylint, Semgrep — was designed before AI coding assistants were generating the majority of code on some teams. It catches the security patterns it was trained on. It misses the performance anti-patterns AI assistants introduce at a high rate, and it misses the hallucinated import problem that doesn't exist in human-written code at all.

This guide covers four tool categories, what each one catches in AI-generated Python specifically, and how BrassCoders bundles all four into a single scan. The benchmark numbers throughout come from BrassCoders's reproducible AI-coder bug benchmark (June 2026, version 2.0.8) — 12 AI-generated Python files with one planted bug each, run against five tools.

Why AI-Generated Python Needs a Different Stack

BrassCoders's benchmark (June 2026) found that Bandit caught 6/12 bugs and Semgrep caught 4/12 — both solid results in their design scope, both completely missing the four performance anti-patterns AI coding assistants introduce reproducibly. A frontier model reviewing the same files caught 12/12 on explicit request but warned proactively 0 times while generating the code.

The gap isn't a scanner deficiency. It's a scope mismatch. Bandit was designed before O(N²) list concatenation in a loop was a common code-review concern — because human engineers rarely write that pattern. AI coding assistants write it at a measurable rate because it appears in their training data (tutorials, quick scripts, demo code) and looks correct at a glance. The standard SAST scanner set was calibrated to human-written code bugs. AI-generated code introduces a different distribution of bugs.

The four categories below map to that distribution: security (covered well by existing scanners), performance (the gap), secrets and PII (partially covered, improvable), and correctness (mostly outside rule-based tools' reach).

Category 1: Security Scanners

Bandit, Pyre/Pysa, and Semgrep together cover the security surface of AI-generated Python well. BrassCoders's benchmark: Bandit 4/4 on security bugs, Semgrep 4/4 on security bugs. AI coding assistants reproduce SQL injection via f-string formatting, command injection via subprocess(shell=True), and insecure deserialization via pickle.loads() because those patterns are common in their training data.

Bandit (github.com/PyCQA/bandit) is the Python security linter from PyCQA. It checks AST nodes for known security anti-patterns mapped to OWASP rule IDs. The rules it covers: SQL injection (B608), command injection (B602/B603), insecure deserialization (B301/B403), hardcoded credentials (B105/B106/B107), SSL/TLS misconfiguration (B501-B510), and ~100 additional patterns. Fast (seconds on typical codebases), zero configuration, deterministic.

Pyre and Pysa (pyre-check.org) are Meta's type-checker and taint analyzer for Python. Pysa traces how values flow from untrusted sources (HTTP requests, environment variables) to sensitive sinks (database queries, subprocess calls, file writes). It catches inter-procedural injection vulnerabilities that Bandit's local pattern-matching misses. Slower than Bandit (seconds to minutes depending on project size) and requires type annotations to be most effective.

Semgrep (semgrep.dev) matches arbitrary AST patterns using its own pattern language. Its OSS Python ruleset covers most of the OWASP Top 10, Django/Flask/FastAPI-specific anti-patterns, and common Python misuse patterns. Semgrep's power is custom rules — if your team has org-specific anti-patterns, Semgrep's pattern language is how you codify them.

Category 2: Performance Scanners — the Gap

BrassCoders's performance scanner caught 4/4 AI-coder perf anti-patterns in the June 2026 benchmark. Bandit caught 0/4. Semgrep caught 0/4. Pylint caught 0/4. The four patterns have deterministic AST signatures — a rule-based scanner can catch them — but no standard scanner has rules for them because they weren't common in human-written code before AI coding assistants.

The four patterns BrassCoders's performance scanner detects:

  • O(N²) string concatenation: result += row or csv_data += line inside a loop. Python strings are immutable — every concatenation allocates a new string and copies both. On 100 rows, unnoticeable. On 100,000 rows, the allocation cost dominates and the function hangs. AI assistants generate this pattern frequently because it reads naturally and the tutorial examples they trained on used small datasets.
  • list.insert(0) in a loop: building a reversed list by prepending to the front. Each insert(0) shifts every existing element right — O(N) per operation, O(N²) total. The fix is list.append() followed by list.reverse(), or a deque with appendleft().
  • Triple-nested loop as a join: iterating over three lists in nested for loops to find matching elements, when a dict lookup would reduce the operation from O(N³) to O(N). AI assistants generate the nested loop pattern when the data relationships aren't clear from the prompt; they optimize when explicitly asked to.
  • Unbounded while True: an accumulation loop with no timeout, no size cap, and no iteration limit. Fine for a bounded task; a resource exhaustion vector when the input is controlled by a caller who can pass an arbitrarily large dataset.

Each of these has a deterministic AST signature. An AST rule visits the relevant node types (string concatenation operators inside a For loop body, ListInsert calls with index 0 inside a For loop body, etc.) and emits a finding with a file path, line number, and evidence string. No inference required.

Category 3: Secrets and PII Scanners

BrassCoders's secrets stack — detect-secrets plus a custom format layer — caught 2/2 secret-format bugs in the June 2026 benchmark. AI coding assistants introduce hardcoded credentials at a meaningful rate: they generate example code with realistic-looking API keys, and developers copy-paste the example including the key literal.

detect-secrets (github.com/Yelp/detect-secrets) is Yelp's entropy-based secret scanner. It combines regex pattern matching (known credential formats) with Shannon entropy scoring (high-entropy strings that look like secrets even without a known format). Covers AWS access keys, GitHub PATs, Slack tokens, RSA private keys, Stripe API keys, and ~40 additional formats.

BrassCoders's custom secret-pattern scanner extends the coverage to formats detect-secrets doesn't ship rules for: OpenAI API keys (sk-...), Anthropic API keys (sk-ant-...), SendGrid, Mailgun, Twilio, DigitalOcean, and NPM publish tokens. The combined scanner covers 20+ credential formats.

BrassCoders's privacy/PII scanner flags patterns that suggest personally identifiable information in source code — phone number regexes matched against string literals, SSN patterns, email address patterns in hardcoded test fixtures, and similar signals. PII in source code is a distinct problem from PII in a database; the scanner catches the accidental case where a developer hardcoded a real name or real email in a test fixture.

Category 4: Correctness Linters

Correctness bugs — logic errors, unguarded edge cases, type inconsistencies — are the hardest category for rule-based scanners. Pylint caught 1/12 in the benchmark (the unguarded sum(x) / len(x) division by zero). The frontier model caught 2/2. The correctness gap is honest: rule-based tools can catch specific known-wrong patterns (dividing by len() without an empty check) but can't reason about arbitrary logic.

Pylint (pylint.readthedocs.io) is the Python linter from PyCQA covering code style, naming conventions, unused variables, type inconsistencies, and some logic errors. Its value for AI-generated code: it catches naming that violates Python conventions (which AI assistants occasionally produce) and some type-checking errors that Pyre's full inference catches as well. It is not a security or performance tool.

ast-grep (ast-grep.github.io) is a structural search-and-replace tool that works at the AST level. It's useful for correctness patterns where you can express "if the code looks like this, it's probably wrong" — e.g., calling .strip() on a value that was just returned by .strip(), or passing a mutable default argument. BrassCoders bundles ast-grep for this structural pattern layer.

BrassCoders's AI-pattern scanner addresses the hallucinated import problem: AI coding assistants sometimes import packages that don't exist on PyPI. The package name looks plausible — similar to a real package, slightly misspelled, or a package that exists on npm but not PyPI. The import resolves at runtime with an ImportError, but a visual review passes. BrassCoders's scanner cross-references imports against the PyPI index and flags packages that aren't resolvable.

The Full Benchmark Table

BrassCoders's reproducible benchmark, run June 13 2026 against 12 AI-generated Python files (brasscoders 2.0.8, claude-sonnet-4-6 as the review model, real tool output):

Tool Security (4) Perf (4) Secrets/PII (2) Correctness (2) Total
BrassCoders 2.0.8 4/4 4/4 2/2 1/2 11/12
Claude sonnet-4-6 (review) 4/4 4/4 2/2 2/2 12/12
Bandit 1.8.3 4/4 0/4 2/2 0/2 6/12
Semgrep (OSS rules) 4/4 0/4 0/2 0/2 4/12
Pylint 0/4 0/4 0/2 1/2 1/12

The one bug BrassCoders missed — unguarded sum(x) / len(x) with no empty-list check — is a pure logic bug. No AST rule can detect it without reasoning about input invariants. A frontier model reviewer caught it; the deterministic tools didn't. This is the honest correctness gap in rule-based scanning: it catches known-wrong patterns, not arbitrary logic errors.

Running the Full Stack Without the Configuration Cost

BrassCoders 2.0.8 bundles all four scanner categories — Bandit, Pylint, Pyre/Pysa, Semgrep, ast-grep, detect-secrets, plus BrassCoders's custom performance, PII/privacy, AI-pattern, secret-format, content-moderation, and JavaScript/TypeScript detectors — in a single Python package with a single CLI command. No per-scanner configuration files required; no version compatibility management across 12 separate tool installs.

pip install brasscoders
brasscoders --offline scan /path/to/your/project

Output lands in .brass/ with three files: ai_instructions.yaml (short, severity-ranked findings designed for pasting into Claude Code or Cursor), detailed_analysis.yaml (every finding with file path, line number, scanner source, and evidence), and security_report.yaml (security-only view for audit purposes).

The --offline flag makes zero outbound network calls explicit and enforced — the scan is entirely local. No source code leaves the machine. For regulated environments or air-gapped CI runners, this is the auditable guarantee.

Adding the Stack to CI

BrassCoders exits with code 1 on any CRITICAL finding, which is enough to fail a GitHub Actions step and block a merge. The full CI setup runs in under 60 seconds on a typical Python codebase.

name: BrassCoders Scan
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  brasscoders:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - name: Install BrassCoders
        run: pip install brasscoders==2.0.8
      - name: Run BrassCoders scan
        run: brasscoders --offline scan .
      - name: Upload .brass artifact
        uses: actions/upload-artifact@v4
        with:
          name: brasscoders-findings
          path: .brass/

The .brass/ artifact uploaded by the workflow gives reviewers the ranked findings list without running the scan themselves. Setting GitHub branch protection to require the brasscoders check to pass before merge turns the scan from advisory into a real gate.

Adding AI-Powered Enrichment

The OSS core scan covers all four bug categories with heuristic deduplication. BrassCoders Paid ($12/month, one seat) adds an AI-powered semantic deduplication pass that reduces a typical scan's 1500+ raw findings to roughly 30 actionable ones — deduplicated against your project's signature (top-level filenames, README excerpt, dependency manifest). The enrichment runs through BrassCoders's hosted gateway, which receives already-redacted findings; never raw source code.

For most teams the right path is: start with the OSS core, evaluate the raw output volume, and upgrade to Paid when the noise reduction becomes worth $12/month. The detection is identical at both tiers; only the output quality differs.

Frequently Asked Questions

Can I use Bandit alone to scan AI-generated Python?

Bandit catches the security half well — SQL injection, command injection, insecure deserialization, hardcoded credentials. It caught 6 of 12 bugs in BrassCoders's AI-coder benchmark, all 6 in the security and secrets categories. It caught 0 of the 4 performance anti-patterns AI coding assistants introduce because those patterns are outside Bandit's design scope. A Bandit-only CI gate leaves the performance gap open.

What are the AI-coder performance anti-patterns?

Four AST-detectable patterns AI coding assistants introduce in Python: O(N²) string concatenation (csv += row in a loop), list.insert(0) in a loop (O(N²) rewrites), triple-nested loops used as joins when a dict lookup would be O(N), and unbounded while True loops with no exit condition or size cap. Each has a deterministic AST signature BrassCoders's performance scanner catches; none has a Bandit rule.

Does BrassCoders replace Semgrep?

BrassCoders bundles Semgrep internally — you get Semgrep's OSS ruleset as part of the BrassCoders scan. If you have custom Semgrep rules, you can run BrassCoders for its 12-scanner coverage and Semgrep separately for the custom rules; the outputs don't conflict. BrassCoders replaces a standalone Semgrep invocation for the OSS ruleset.

What does the --offline flag do?

The --offline flag makes zero outbound network calls explicit and enforced. BrassCoders's OSS core makes no outbound calls by default; --offline adds a hard check at runtime and exits non-zero if any network call is attempted. For regulated environments (HIPAA, SOC 2) or air-gapped CI runners, --offline is the auditable guarantee that nothing left the machine.

How does BrassCoders handle the noise from 12 scanners?

The OSS core runs a heuristic deduplication and ranking pass that reduces typical scan output from 1500+ raw findings to roughly 300-500. The Paid plan ($12/month) adds an AI-powered semantic deduplication pass against a project signature — title, README, top-level filenames — that gets typical output down to roughly 30 actionable findings. The project signature sent to the BrassCoders gateway is at most 7,500 characters of non-sensitive metadata; never raw source code.

Can BrassCoders scan JavaScript or TypeScript alongside Python?

Yes. BrassCoders includes a JavaScript/TypeScript scanner as one of its 12 scanners. The JS/TS scanner uses a Node.js Babel parser and runs automatically when .js, .ts, .jsx, or .tsx files are present in the scanned directory. Python is the primary focus and where the AI-coder-specific perf rules apply; the JS/TS scanner handles secrets and common security patterns in JavaScript.