The case for a noise filter, in three parts.
The training incentive for Claude / GPT / Gemini-class reviewers rewards thoroughness. That's correct in the abstract — better to over-flag than miss a real bug. In practice, it produces output where every PR review has eight items and seven are conditional ("if name is user-controlled, this could be XSS").
Developers compensate by skimming. Skimming has a known failure mode: the one real critical issue gets dismissed alongside the seven speculative ones. The output becomes uniformly low-trust.
This is a known industry-level problem. The Hacker News thread "There is an AI code review bubble" has the discussion. Anthropic acknowledged in April 2026 that a verbosity-constraint change "caused a 3% drop in coding quality evaluations." The Register's April 23 2026 piece on Claude Opus 4.7 false-positive AUP refusals covered the same shape from a different angle.
You can't fix the underlying training incentive — that's Anthropic / OpenAI / Google's problem and they're working on it. What you can do is intercept the output and apply a deterministic filter before a human ever sees it.
Brass's filter is small and explainable: a confidence threshold per finding type, a style-issue allowlist (Pylint C0301 et al. drop), per-file caps so no single file dominates, and a hard "CRITICAL severity always survives" rule. The math fits in a 60-line Python module. There's no LLM in the loop; the reasoning is auditable.
That filter then runs against two inputs: (a) Brass's own scanners (detect-secrets, bandit, pylint, radon, plus AI-coder anti-pattern AST analysis), and (b) any third-party AI reviewer's JSON output, via brassai filter. Either way, the artifact you read is already triaged.
Brass doesn't replace it; Brass filters its output. brassai filter takes Claude Code's review JSON in and emits the survivors. Use both.
claude-mem for memory between sessions?claude-mem is excellent and we recommend it. Different problem: claude-mem remembers what was said; Brass filters what was found. They compose well — you can use both, and the memory layer doesn't conflict with the review filter.
Those generate findings. Brass triages findings. The problem isn't "we need more findings"; it's "we need fewer findings that are higher signal." Brass is the layer that makes any of the others useful instead of overwhelming.
CLI runs in CI, in pre-commit hooks, in editor-on-save, on a server. IDE plugins run in one IDE. Composability wins.
Brass scans private source code. We don't want to be a SOC2 audit target; you don't want to upload your code to a vendor for static analysis. Run it on your machine; output stays on your disk.
We don't ship vague claims. The benchmark script clones nine pinned public Python projects, runs Brass on each, and reports finding counts and runtimes — reproducible on any machine.