Why BrassCoders

1. AI reviewers err on the side of completeness.

The training incentive for Claude / GPT / Gemini-class reviewers rewards thoroughness. That's correct in the abstract — better to over-flag than miss a real bug. In practice, it produces output where every PR review has eight items and seven are conditional ("if name is user-controlled, this could be XSS").

Developers compensate by skimming. Skimming has a known failure mode: the one real critical issue gets dismissed alongside the seven speculative ones. The output becomes uniformly low-trust.

This is a known industry-level problem. The Hacker News thread "There is an AI code review bubble" has the discussion. Anthropic acknowledged in April 2026 that a verbosity-constraint change "caused a 3% drop in coding quality evaluations." The Register's April 23 2026 piece on Claude Opus 4.7 false-positive AUP refusals covered the same shape from a different angle.

2. A noise filter is the right intervention.

You can't fix the underlying training incentive — that's Anthropic / OpenAI / Google's problem and they're working on it. What you can do is intercept the output and apply a deterministic filter before a human ever sees it.

BrassCoders's filter is small and explainable: a confidence threshold per finding type, a style-issue allowlist (Pylint C0301 et al. drop), per-file caps so no single file dominates, and a hard "CRITICAL severity always survives" rule. The math fits in a 60-line Python module. There's no LLM in the loop; the reasoning is auditable.

That filter then runs against two inputs: (a) BrassCoders's own scanners (detect-secrets, bandit, pylint, radon, plus AI-coder anti-pattern AST analysis), and (b) any third-party AI reviewer's JSON output, via brasscoders filter. Either way, the artifact you read is already triaged.

3. Why not the alternatives?

Why not just use Claude Code's built-in review?

BrassCoders doesn't replace it; BrassCoders filters its output. brasscoders filter takes Claude Code's review JSON in and emits the survivors. Use both.

Why not `claude-mem` for memory between sessions?

claude-mem is excellent and we recommend it. Different problem: claude-mem remembers what was said; BrassCoders filters what was found. They compose well — you can use both, and the memory layer doesn't conflict with the review filter.

Why not Greptile / Cursor's review / SonarQube?

Those generate findings. BrassCoders triages findings. The problem isn't "we need more findings"; it's "we need fewer findings that are higher signal." BrassCoders is the layer that makes any of the others useful instead of overwhelming.

Why CLI and not an IDE plugin?

CLI runs in CI, in pre-commit hooks, in editor-on-save, on a server. IDE plugins run in one IDE. Composability wins.

Why not a SaaS?

BrassCoders scans private source code. We don't want to be a SOC2 audit target; you don't want to upload your code to a vendor for static analysis. Run it on your machine; output stays on your disk.

Reproducible benchmarks

We don't ship vague claims. BrassCoders runs against a curated set of pinned public projects — intentionally-vulnerable training corpora plus mature real-world codebases — with finding counts and runtimes you can reproduce.

Read the latest benchmark results