How BrassCoders Catches Slow AI-Generated Code

AI assistants write O(N²) loops that pass every test and crawl at scale. BrassCoders flags all four patterns where Bandit and Semgrep catch none.

Copper Sun Brass Team · · 4 min read
benchmarksoss-core

AI assistants write code that passes every test and crawls in production. A 2024 efficiency benchmark, ENAMEL, found frontier models reach only a fraction of expert-level efficiency even when their output is functionally correct. The failure is quiet: the unit test is green, the reviewer sees clean code, and the O(N²) loop only shows itself at volume. BrassCoders’s performance scanner catches four of these patterns straight from the source, before the code ever runs.

The Four Patterns BrassCoders Flags

BrassCoders’s performance scanner detects four AST-signatured slow patterns that AI assistants reproduce: quadratic string building, prepend-in-a-loop, nested-loop joins, and unbounded reads. It caught all four in the June 2026 benchmark, where the standard security and lint tools caught none.

  • O(N²) string concatenation: result += row inside a loop. Python strings are immutable, so every concatenation allocates a new string and copies both halves. Unnoticeable on 100 rows; the function hangs on 100,000.
  • list.insert(0) in a loop: building a list by prepending. Each insert shifts every existing element right, turning an O(N) build into O(N²). The fix is append then reverse, or a deque.
  • Triple-nested loop as a join: iterating three lists to find matches when a dict lookup would make it O(N). AI assistants reach for the nested loop when the data relationships aren’t spelled out in the prompt.
  • Unbounded while True: an accumulation loop with no timeout, size cap, or iteration limit. Fine for a bounded task; a resource-exhaustion vector when a caller controls the input size.

The full field guide to how these show up in real AI-generated code is in AI-Coder Performance Bugs in the Wild.

Why the Standard Scanners Miss Them

BrassCoders treats the performance gap as a scope mismatch, not a scanner deficiency. Bandit, Semgrep, and Pylint were calibrated on human-written code, and a human engineer rarely writes csv += line in a hot loop. The pattern wasn’t a common review concern, so the standard tools never grew a rule for it.

AI assistants changed the input distribution. They reproduce these shapes from training data full of tutorials and demo scripts that used small datasets, where the slow pattern reads naturally and runs fine. The research backs the pattern: EffiBench and Mercury both measure generated code consuming far more time and memory than the canonical efficient solution while still passing the functional tests. Correct-but-slow is the default the benchmarks find, not the exception.

Deterministic, Not Profiled

BrassCoders catches these patterns from the abstract syntax tree, before the code runs, because each one has a fixed structural signature. The scanner visits the node types that define the pattern — a concatenation operator whose target is reassigned inside a For body, an insert(0) call inside a loop — and emits a finding with the file, the line, and the evidence string. No workload, no profiler, no model deciding what looks slow.

That determinism is what makes it a pre-merge gate. A profiler needs a running process and a large enough input to surface the hotspot, which usually means production. An AST rule needs only the source, so the finding lands in CI on the pull request, before anyone ships. The same file produces the same finding on every run.

What It Catches, and What It Doesn’t

BrassCoders flags the four signatured patterns; it does not claim to find arbitrary algorithmic inefficiency. A novel O(N²) buried in a custom data structure has no fixed signature, so no deterministic rule will catch it — that’s a judgment call for a profiler under load or a reviewer reading the logic. BrassCoders reports the patterns it can prove from the structure and leaves the open-ended performance review to you.

The honest pairing is scanner plus profiler. BrassCoders catches the four known anti-patterns early and cheaply, on every scan. A sampling profiler like py-spy or Scalene then confirms the real cost of a flagged hotspot under load, and surfaces the slow code that doesn’t match a known shape. The reproducible head-to-head against a frontier model is in the AI-coder bug benchmark, with the underlying efficiency research in the performance anti-patterns research.

Run It

The performance scanner runs automatically in every scan — there’s no flag to remember. Install BrassCoders and scan:

pipx install brasscoders
brasscoders --offline scan

Performance findings land in .brass/ai_instructions.yaml next to the security, secrets, and correctness findings, ready to hand to Claude Code or Cursor. For the full set of detectors that run in the same pass, see what BrassCoders detects.

Frequently Asked Questions

What performance bugs do AI assistants write?

Four AST-detectable patterns recur: O(N²) string concatenation in a loop, list.insert(0) in a loop, a triple-nested loop used as a join, and an unbounded while-True read. Each runs fine on small inputs and degrades at volume. BrassCoders's performance scanner caught all four in its June 2026 benchmark; Bandit, Semgrep, and Pylint caught zero.

Why do security scanners miss performance bugs?

Scope, not deficiency. Bandit and Semgrep were calibrated for human-written security flaws, and O(N²) string building in a loop wasn't a common review concern because human engineers rarely write it. AI assistants reproduce it from tutorial-style training data, so the standard SAST set has no rule for it. BrassCoders adds the rules.

Can a deterministic scanner really catch performance bugs?

For these four, yes — each has a fixed AST signature. BrassCoders's scanner visits the relevant node types, like a string-concatenation operator inside a For loop or an insert(0) call inside a loop, and emits a finding with file, line, and evidence. No profiling run and no model inference; the same code produces the same flags.

Doesn't a passing unit test mean the code is fast enough?

No. A unit test checks correctness on a small input, where an O(N²) loop and an O(N) loop are indistinguishable. The cost only appears at volume, which a small test never exercises. That's why these bugs survive review and ship: the code is correct, the test is green, and the slowdown is invisible until production.

How do I run the performance scanner?

It runs automatically in every brasscoders scan — no flag needed. Install with pipx install brasscoders and run brasscoders --offline scan; performance findings land in .brass/ai_instructions.yaml alongside the security and secrets findings.