Copilot Does X. BrassCoders Does Y.

AI code review and deterministic static analysis are complementary layers, not competitors. The math of running both, the hand-off prompt, and when replacing one with the other is wrong.

Copper Sun Brass Team · · 7 min read
comparisonai-code-review

GitHub Copilot has 4.7 million paid subscribers. Cursor has over 1 million paying users and $2 billion in annual recurring revenue. Anthropic’s Claude Code crossed a $2.5 billion run-rate. The AI coding assistant market is decided. The question that remains is what to put underneath it.

BrassCoders, the bug scanner for AI coders, is the deterministic layer that fills the gap. This piece walks through the precise division of labor — what Copilot catches well, what BrassCoders catches that Copilot misses, the math of running both, and the hand-off prompt that ties the two layers together.

This is the fourth piece in BrassCoders’s AI Coding Assistant Blind Spots pillar — the conversion-focused piece that answers the “I already have Copilot” objection.

The Two Layers Of Code Review

BrassCoders treats AI code review and deterministic static analysis as complementary layers of the same workflow. The AI layer judges what looks right in context. The static layer catches what is deterministically wrong. Each layer does the work the other cannot do.

The AI layer’s strength is context. An LLM reading a diff can tell whether a particular pattern matters given the surrounding code: this md5 call is for cache keys, that md5 call is for password hashing. The model has read enough code to know which contexts make which patterns dangerous. It applies that judgment per finding.

The static layer’s strength is exhaustive search. Static analyzers run every rule against every file, deterministically. The same input produces the same output, every time. There is no attention budget. There is no context window. A bug whose evidence spans ten files is just as findable as a bug in a single function.

The pairing covers more ground than either alone. The static layer surfaces every match. The AI layer reads each match in context and decides whether it matters. The output is a ranked list of findings the developer actually addresses.

What Copilot Catches Well

BrassCoders frames Copilot’s strengths precisely. The categories where contextual judgment is the load-bearing requirement are where Copilot performs best — it reads the surrounding code, infers intent, and judges patterns by their use rather than their shape.

Style and clarity. Copilot identifies inconsistent naming, missing type annotations, awkward control flow, and dead code. The judgments are subjective, and the LLM applies the judgment well because the training data is full of style critiques. Code that looks wrong but is technically correct is Copilot’s natural territory.

Common bugs in single-file context. Copilot catches off-by-one errors, missing null checks, unhandled exceptions, and bad equality checks when the bug is visible in the file under review. The category requires reading the code carefully, not searching exhaustively across the repository. Copilot is good at the careful read.

Refactor suggestions. Copilot proposes how to break a long function, name a variable better, extract a helper. The proposals are useful when accepted, useful when ignored. The category is generative; Copilot is generative.

What Copilot does not catch, by category, is what BrassCoders catches.

What BrassCoders Catches That Copilot Misses

BrassCoders catches the seven blind-spot categories the pillar lays out: cross-file taint flows, hardcoded credentials past comment boundaries, hallucinated package imports, race conditions, context-dependent insecure patterns, auth-middleware misconfigurations, and PII flows across call paths. Each category has a deterministic detector that runs locally with zero outbound calls in the OSS core.

The pillar walks each category in detail at AI Coding Assistant Blind Spots.

The structural reason. Each of the seven categories requires either exhaustive search across the codebase (find every match for this pattern in every file) or interprocedural analysis (trace this flow across function boundaries). LLMs do neither well. ACM TOSEM 2026 documented Copilot’s failure on the injection cases. USENIX Security 2025 documented hallucinated imports at 19.7% across major models. The categories are not solved by better prompts.

The concrete difference per category. Cross-file SQL injection: Copilot misses, BrassCoders catches via Pyre/Pysa. Hardcoded API key past a # example only comment: Copilot misses (the comment looks like documentation), BrassCoders catches via detect-secrets entropy + format patterns. import fastapi_users_pydantic (doesn’t exist): Copilot misses, BrassCoders catches via --check-package-hallucination. @app.route('/admin') with no @login_required: Copilot might catch in isolation, regularly misses across files; BrassCoders catches via the auth-pattern analyzer.

The Math Of Running Both

BrassCoders ran reproducible benchmarks across nine open-source codebases and consistently produces a 50× to 70× reduction from raw scanner output to ranked enrichment output. The math: 1,500 raw findings becomes 30 ranked findings; the AI assistant processes the 30 in under two minutes.

The numbers. A Django reference codebase produces ~1,800 raw findings across BrassCoders’s 12 scanners. The Paid plan’s AI-powered enrichment deduplicates findings that name the same root cause and reranks by relevance to the project signature. The output is ~30 critical-issue findings, ranked. The reproducible scan instructions and pinned commit hashes are at coppersun.dev/benchmarks.

What the developer does with the 30. The hand-off prompt to Claude Code or Cursor: “Read .brass/ai_instructions.yaml. Address the critical_issues in order. For each one, propose a diff and explain the fix.” The AI assistant reads the file, walks the findings, applies contextual judgment per finding, and proposes a diff for each one. The developer reviews the diffs and approves.

Time math. A 30-finding review takes a senior engineer roughly 10 minutes with AI-assisted triage. A 1,500-finding review takes nobody, because nobody reads 1,500 findings. The reduction from unreadable to readable is the value. The reduction from readable to ranked is the secondary value.

The Hand-Off Prompt

BrassCoders ships with a single canonical hand-off prompt that works across every AI assistant with file-read capability. The prompt is one sentence, the file path is fixed, the action is explicit.

The prompt:

Read .brass/ai_instructions.yaml in this project.
Address the critical_issues in order. For each one,
propose a diff and explain the fix.

Tested against Claude Code, Cursor, Continue, Aider, and GitHub Copilot Workspace. Each tool reads the YAML, walks the critical_issues array, and proposes diffs per finding. The output format varies (Claude Code emits patches inline, Cursor opens the file at the noted line, Aider applies the diff directly), but the input is identical.

The prompt is the integration. There is no plugin to install, no extension to configure, no API key to manage. BrassCoders writes a file to the local filesystem; the AI assistant reads the file. The architecture is filesystem-native because filesystem-native survives every IDE migration and every AI assistant change.

When Replacing One With The Other Is Wrong

BrassCoders is the deterministic complement to AI code review and is not a replacement for it. The setup that works runs both layers. Replacing either with the other leaves a coverage gap that the missing layer was filling.

Replacing Copilot with BrassCoders is wrong because BrassCoders does not generate code, does not refactor, does not suggest style improvements. BrassCoders is a scanner, not a coding assistant. The developer still wants the AI assistant for the actual writing, completion, and judgment work.

Replacing BrassCoders with Copilot is wrong because Copilot does not run exhaustive search across the codebase, does not trace taint flows interprocedurally, does not check registries for hallucinated imports. Copilot is a coding assistant, not a scanner. The developer still wants the static layer for the categories where exhaustive search is the requirement.

The cost math supports running both. Copilot Individual is $10/month per user, Business is $19. Cursor is $20/month. BrassCoders Paid is $12/dev/month. Running Copilot + BrassCoders runs $22-$31/dev/month, comparable to CodeRabbit’s $24-$48/seat or Greptile’s $30/seat for LLM-only PR review. Running both is not a multiplier; it is a layered cost structure for layered coverage.

Install BrassCoders alongside the AI assistant you already have. pipx install brasscoders. Run a scan. Hand the YAML to Copilot, Cursor, Continue, or whatever assistant you use. The two layers compose immediately.

Frequently Asked Questions

Should I replace Copilot with BrassCoders?

No. BrassCoders is the deterministic complement, not a competitor. Copilot judges what looks right in context. BrassCoders catches what is deterministically wrong. Replacing either with the other leaves a coverage gap. The setup that works runs both.

Does this work with Cursor, Continue, and Aider?

Yes. BrassCoders writes a YAML file to .brass/ai_instructions.yaml. Any AI assistant with file-read capability — Claude Code, Cursor, Continue, Aider, GitHub Copilot Workspace — reads the file directly. The hand-off prompt is the same across tools.

What is the hand-off prompt?

Read .brass/ai_instructions.yaml in this project. Address the critical_issues in order. For each one, propose a diff and explain the fix. The prompt is the same for every AI assistant; it works because the YAML is plain text with stable schema.

Does running both layers cost more?

BrassCoders OSS is free. Paid is $12/dev/month. Copilot is $10/user/month for Individual, $19 for Business. Cursor is $20/month. Running both layers costs $22-$32/dev/month — less than most LLM-only PR review tools (CodeRabbit at $24-$48/seat, Greptile at $30/seat). The combined cost is the LLM tool plus BrassCoders, not a multiplier.

Is the math actually 30 findings?

On the reproducible benchmarks BrassCoders runs against nine open-source codebases (Django, FastAPI, NodeGoat, PyGoat, etc.), the raw scan typically produces 800-2,000 findings, and the Paid plan's AI-powered enrichment reduces that to 25-45 ranked findings. The exact number varies by codebase. The benchmarks at coppersun.dev/benchmarks are pinned to specific commit hashes so the numbers are reproducible.