How to Triage a 500-Line AI Pull Request in 10 Minutes

A worked example of BrassCoders plus an AI assistant doing real PR review work. Scan locally, hand the ranked output to Claude Code or Cursor, walk each finding to a diff. Total reviewer time stays roughly constant regardless of diff size.

Copper Sun Brass Team · · 9 min read
ai-code-reviewengineeringoss-core

BrassCoders plus an AI assistant turns a 45-60 minute review of a 500-line AI-generated pull request into roughly 10 minutes of focused work. The pattern is well-defined: scan locally, hand the ranked YAML to your AI assistant, walk each finding to a proposed diff, approve the diffs. Total reviewer time stays roughly constant regardless of whether the underlying PR is 100 lines or 1000.

This post walks through the workflow end to end with a concrete worked example, then covers why the same approach works across Claude Code, Cursor, Continue, and Aider — and what to do when BrassCoders surfaces a finding you disagree with.

Why Reviewing AI Diffs Line-by-Line Doesn’t Scale

BrassCoders’s measured failure mode for line-by-line AI-PR review: a senior engineer spends an hour on a 500-line diff and still misses real bugs because the noise-to-signal ratio drowns the relevant ones. The fix isn’t more diligence; it’s a different review unit.

The math is straightforward. A 500-line AI-generated diff typically contains 30-50 changes worth examining (real refactors, real new logic) and another 400-450 lines of mechanical follow-ons (formatter passes, import reorganizations, comment additions, type-annotation churn). A line-by-line review forces the reviewer to read all 500 lines to find the 30-50 that matter. The cognitive cost of skimming the 400 boring lines is what produces the failure mode: by line 300 the reviewer is in fast-scan mode, and the one real bug that hid on line 312 gets dismissed alongside the seven boring formatter changes.

What the data says: BrassCoders’s published benchmarks measure the ratio of “raw findings emitted by scanners” to “actionable findings after enrichment” on nine open-source codebases, and the typical ratio is 5-7x. For a 500-line diff, that translates to roughly 47 raw scanner findings reducing to 12 critical-issues that an AI reviewer (or human) actually needs to examine.

The architectural shift: stop asking the reviewer to find the 12 findings inside a 500-line diff. Instead, hand them the 12 findings directly. The 500-line diff becomes an artifact of context, not the unit of work.

The BrassCoders Workflow in Three Steps

BrassCoders’s recommended workflow has three steps: scan, hand off, iterate. The scan produces a YAML at .brass/ai_instructions.yaml; the hand-off is a single prompt to your AI assistant pointing at that file; the iteration walks each finding to a proposed diff, which the human reviewer approves.

Step 1: scan. From the repo root, run brasscoders scan . (or brasscoders scan path/to/project). The OSS core makes zero outbound network calls; the Paid plan sends already-redacted findings (never source) to the gateway for semantic deduplication and reranking. Output lands in .brass/. The relevant file for AI hand-off is .brass/ai_instructions.yaml — a short, ranked summary file.

Step 2: hand off to your AI assistant. The prompt template that works:

“Read .brass/ai_instructions.yaml in this project. Work through each entry in critical_issues in order. For each one, read the file at the noted line, decide whether the finding is real or a false positive given the surrounding code, and if real propose a diff. If you’re unsure, mark it for human review and move on.”

Claude Code, Cursor, Continue, and Aider all have filesystem access — they read the YAML directly, no copy-paste required. The AI works through findings sequentially.

Step 3: iterate. The AI surfaces each critical issue with a proposed diff. You review the diff (much faster than reviewing the original 500-line PR, because each diff is small and targeted), approve or reject, and move to the next. Issues the AI flagged uncertain get human-only review. Re-run brasscoders scan after fixes to confirm no regression.

The total elapsed time on a typical 500-line PR: scan ~10 seconds, AI walking critical issues ~2-3 minutes, human reviewing the proposed diffs ~5-10 minutes. Roughly 10-15 minutes of focused work versus 45-60 minutes of line-by-line slog.

What’s in ai_instructions.yaml

BrassCoders’s ai_instructions.yaml is the AI consumer’s view of the scan: a short, ranked file (typically 100-500 lines versus the raw scanner output of 3000-10000 lines) containing the keys an AI assistant needs to triage the codebase. The format is stable across releases; AI tools can rely on its structure.

The top-level keys:

project: brasscoders
scanned_at: 2026-05-31T14:23:17Z
brass_version: 2.0.4
how_to_read_this_file: |
  Critical_issues are ranked first; address them in order...
critical_issues:
  - file: src/auth/login.py
    line: 47
    type: HARDCODED_SECRET
    severity: CRITICAL
    description: AWS access key embedded in source
    finding_id: 4a8f2e
  - ...
security_findings:
  - ...
privacy_findings:
  - ...
statistics:
  total_findings: 47
  after_enrichment: 12
  by_severity: { CRITICAL: 4, HIGH: 8, MEDIUM: 24, LOW: 11 }

The critical_issues list is what the AI consumer focuses on. These are the findings ranked highest by severity and project-signature relevance — the ones where missing the fix has real consequence. Below critical_issues are the lower-severity lists (security_findings, privacy_findings, etc.) which the AI consumer can address opportunistically or skip.

The how_to_read_this_file field is a literal instruction block at the top of every BrassCoders YAML. It tells the AI consumer how to interpret the rest of the file. This is the load-bearing detail: BrassCoders doesn’t assume the AI knows the format — it tells the AI explicitly what each section means and what the priority order is. The result is consistent triage behavior across Claude Code, Cursor, Continue, and Aider, because they’re all reading the same instruction block.

Worked Example: A 500-Line AI-Generated Refactor

BrassCoders scans a representative 500-line AI-generated refactor of a Flask handler in 8 seconds and produces 12 critical_issues out of 47 raw findings. The downstream AI consumer (Claude Code in this example) walks each issue in under 2 minutes per finding.

The setup. A developer asks Claude Code to refactor an existing Flask request handler for clarity. Claude Code rewrites the entire 500-line file: extracts helper functions, adds type annotations, replaces a deprecated library call, adds error handling, reformats with black. The PR is 500 lines of diff. Without BrassCoders, the review would be a line-by-line read of all 500 lines.

The scan. Run brasscoders scan . in the repo. Output:

Scanning project (12 scanners)...
✓ bandit            42 raw findings
✓ pylint            18 raw findings
✓ pyre              0 raw findings
✓ semgrep           8 raw findings
✓ detect-secrets    2 raw findings (HARDCODED_SECRET)
✓ brass-ai-pattern  3 raw findings
✓ ...
Total: 47 raw → 12 critical_issues after enrichment
Output: .brass/ai_instructions.yaml
Time: 8.3s

The hand-off. Open a Claude Code session and paste:

“Read .brass/ai_instructions.yaml. Work through each entry in critical_issues in order. For each one, read the noted file at the noted line, decide whether the finding is real or a false positive given the surrounding code, and if real propose a diff. If you’re unsure, mark it for human review and move on.”

Claude Code reads the YAML, then walks the 12 issues. The first finding is the AWS access key on line 47 — Claude proposes a diff that moves the key to an environment variable. The second is a SQL injection sink the refactor introduced by interpolating a request parameter into a query string — Claude proposes parameterizing. The third is a use of assert for input validation that gets stripped under python -O — Claude proposes raising an explicit exception. And so on.

The result. By minute 3, Claude Code has proposed 11 diffs and flagged 1 finding as uncertain. The reviewer scans the 11 proposed diffs (each is 3-15 lines, easy to read), approves or rejects, and looks at the 1 uncertain finding themselves. Total elapsed: roughly 10 minutes including the scan, AI walk, and human approval.

What the reviewer did NOT have to do: read 500 lines of refactored code. The scan surfaced the 12 things that mattered; Claude Code triaged them against context; the human verified the proposed diffs. The AI’s review precision is high because the input was constrained (12 deterministic findings) rather than open-ended (500 lines of diff).

Works the Same With Cursor, Continue, Aider

BrassCoders’s YAML output is filesystem-resident plain text — any AI assistant with file-read capability can consume it. The same workflow runs identically across Cursor, Continue, Aider, and Claude Code; the only differences are the prompt wording each tool expects.

The four major options each have first-class file-read capability: Cursor, Continue, Aider, and Anthropic’s Claude Code CLI. Pick the one your team already uses; the underlying BrassCoders workflow is identical.

Cursor: same prompt template; Cursor’s composer can read local files via the @ reference. Paste the prompt into the composer and reference @.brass/ai_instructions.yaml so the AI loads the file into context.

Continue: same prompt template; Continue’s @file slash command serves the same purpose. The workflow integrates well with Continue’s slash-command-driven UI because the entire triage cycle is a series of small @file-scoped operations.

Aider: same prompt template; Aider operates with explicit file lists, so add .brass/ai_instructions.yaml to the context with /add .brass/ai_instructions.yaml. Aider’s commit-after-each-change flow pairs well with the iterate-on-findings pattern.

What’s identical across all four: the YAML is the constraint. The AI works against the ranked finding list, not against an unbounded diff. Triage time scales with the number of findings, not the size of the underlying PR.

What’s different across all four: how each tool surfaces the proposed diffs, how each commits changes, how each handles uncertainty. These are workflow ergonomics, not workflow fundamentals. Pick the assistant your team already uses; BrassCoders works the same way underneath.


The reviewer time saved on a single 500-line PR is roughly 35-50 minutes. Multiplied across an active team that ships several AI-augmented PRs a day, the savings compound into entire afternoons of recovered focus time.

Install BrassCoders with pipx install brasscoders and try the workflow on your next AI-generated PR. For the broader context on AI code review failure modes, see AI Code Review: The Practical Guide for 2026.

Frequently Asked Questions

Can my AI assistant just review the AI-generated PR directly?

It can, but the precision tends to be poor. An LLM reviewing diffs has the same training-incentive that produces the noise problem in the first place: it errs toward completeness. The pattern that works better is to constrain the AI's review to a deterministic finding list — let static analysis decide what to look at, let the AI decide whether each finding is real in context. The deterministic constraint makes the AI both faster and more accurate.

What's actually in .brass/ai_instructions.yaml?

A short, ranked YAML file (typically 100-500 lines, versus the raw scanner output of 3000-10000 lines) containing critical_issues, security_findings, privacy_findings, AI-pattern findings, and aggregate statistics. Each finding has a file path, line number, finding type, severity, and a short description. The file is designed to fit comfortably in any AI assistant's context window.

Does this workflow work with Cursor and Continue, not just Claude Code?

Yes. BrassCoders's YAML output is filesystem-resident plain text — any AI assistant with file-read capability can consume it. The workflow runs identically across Claude Code, Cursor, Continue, and Aider. The only differences are the prompt wording each tool expects; the underlying mechanics are the same.

How long does a BrassCoders-augmented review actually take?

For a 500-line AI-generated diff, expect 8-15 seconds for the scan, 2-3 minutes for the AI to walk the critical findings, and another 5-10 minutes of human review for the proposed diffs and any uncertain items the AI flagged. Total: roughly 10 minutes for what would otherwise be a 45-60 minute review.

What if BrassCoders surfaces a finding I disagree with?

Treat it the same as any other false positive: dismiss the finding in your review, optionally add a .brassignore entry to skip the pattern in the future, and move on. BrassCoders is the pattern-reporter, not the decision-maker. The architectural principle is that BrassCoders should be dumb-but-honest about what it sees; the AI consumer (and the human reviewer) make the contextual call about whether it matters.