Your SAST Dashboard Has 1,500 Findings
A static-analysis report with 1,500 findings is functionally a report with zero findings — nobody reads it. The bottleneck is ranking, not detection.
A static-analysis report with 1,500 findings is functionally a report with zero findings. Nobody reads it. The dashboard sits in the build pipeline, the findings accumulate, the developers route them to wontfix or wontread, and the next release ships unchanged. The bottleneck is not detection; the bottleneck is ranking.
BrassCoders, the bug scanner for AI coders, is the ranking layer. The OSS core surfaces the findings. The Paid plan ranks them. The math: 1,500 raw findings reduces to roughly 30 ranked ones a developer actually addresses.
This is the fifth piece in BrassCoders’s AI Coding Assistant Blind Spots pillar — the operational answer to the question every AppSec team is already asking.
The 1,500-Finding Problem
BrassCoders ran reproducible benchmarks against nine open-source codebases and consistently saw raw scanner output in the 800-2,000 findings range per codebase. The Django reference codebase produces ~1,800 raw findings; FastAPI produces ~1,200; PyGoat produces ~900. The 1,500-finding number is the median across all nine benchmarks, not a hypothetical.
The pinned commit hashes and reproducible scan instructions are at coppersun.dev/benchmarks.
The math of unreadability. A senior engineer reads a static-analysis finding and decides “real, false-positive, or wontfix” in about 60 seconds on average, faster for obvious cases. 1,500 findings at 60 seconds per finding is 25 hours of focused triage time. No team has that budget. The triage gets delegated to whoever is least busy, the least-busy developer skims, and skimming produces the known failure mode where the one real critical bug gets dismissed alongside 99 speculative low-severity findings.
The result is a dashboard with 1,500 unresolved findings, accumulating week over week. Veracode’s State of Software Security 2026 documented dashboard fatigue across enterprise SAST programs — most teams resolve under 20% of detected findings within a quarter. The 80% that remain become permanent backlog. The backlog stops getting read.
The detection layer is fine. The ranking layer is missing.
Why Severity Sort Doesn’t Help
BrassCoders treats raw severity as a starting point, not a ranking. Severity sort puts every CRITICAL above every HIGH above every MEDIUM, ignoring whether the finding is in production code or a test fixture, whether the file is on a real code path or in an example directory. The result is a sorted list that still requires manual triage.
The failure mode. A CRITICAL finding in a test file (perhaps a hardcoded credential in a test fixture, intentionally placed for a passing test) ranks above a HIGH finding in production code (perhaps an unsanitized log statement in the main request handler). The developer reading the sorted list sees the CRITICAL first, dismisses it as a test fixture, and develops a habit of dismissing CRITICALs. The HIGH that ships to production never gets read because the developer stopped before reaching the HIGHs.
Severity sort is not wrong; it is insufficient. Raw severity reflects the abstract dangerousness of the pattern. Raw severity does not reflect whether the pattern matters in this codebase. A SQL injection in a route that does not exist outside tests is not actually a SQL injection in production. A hardcoded credential in a sample integration test is not actually a hardcoded credential leak. The context determines the verdict; severity sort drops the context.
Better ranking requires reading the codebase, not just the finding. That is the work the Paid plan does.
What Relevance Ranking Actually Means
BrassCoders defines relevance ranking as scoring each finding against a project signature that captures what the codebase actually does. The signature is derived from the README, manifest, entrypoint, and top-level filenames — the parts of the codebase that signal intent. Findings on code paths the signature emphasizes score higher; findings on peripheral paths score lower.
The mechanic. After the OSS core writes the raw 1,500 findings, the Paid plan sends the findings list (already redacted at the scanner) plus the project signature to the hosted gateway. The gateway passes both to a hosted embedding model. The model computes an embedding for each finding and an embedding for the signature, then scores each finding’s cosine similarity to the signature. High-similarity findings score high; these are the findings whose described pattern lives on the main code path the signature emphasizes. Low-similarity findings score low; these are the findings in test fixtures, example code, and third-party vendored libraries.
The reranked list. The output is the same 1,500 findings, reordered by relevance score. The deduplication step then collapses findings that name the same root cause across different files (a CRITICAL finding flagged by Bandit, Pylint, and the custom AI-pattern detector for the same pickle.loads call becomes one finding, not three). The result is a ranked list of roughly 30 unique findings, with the most codebase-relevant at the top.
The mechanism is the same one that powers semantic search systems. The novelty is the application: ranking SAST findings by codebase relevance, with a CRITICAL-exemption that preserves the critical-issues tail.
The CRITICAL Exemption
BrassCoders’s enrichment pipeline includes a CRITICAL-exemption: any finding marked as critical_issues in the raw output survives every deduplication and reranking step. The exemption is explicit, tested, and intentional.
The reasoning. AI-powered ranking is a probabilistic process. Embeddings can be wrong. Reranking can demote a finding the model misjudges. For most of the finding distribution, that is fine — false-positive demotions are recoverable, and the saved triage time is the value. For the critical tail, false-positive demotions ship real bugs. The risk asymmetry is too steep.
The implementation. The Paid pipeline runs the dedup and rerank steps on findings tagged below CRITICAL severity. Findings tagged CRITICAL bypass the pipeline and land directly in critical_issues in the output YAML. The customer’s AI assistant reads critical_issues first, regardless of the rest of the ranked list.
The honest trade-off. The CRITICAL exemption costs some triage time on real false-positive CRITICALs. A scanner that emits a CRITICAL on a test-fixture credential will produce a CRITICAL the developer dismisses. The dismissal cost is acceptable because the alternative — the AI pipeline demoting a real CRITICAL into the noise without flagging the demotion — is unacceptable. BrassCoders pays the false-positive cost on the critical tail and saves time on the rest.
The Heuristic-Vs-AI-Powered Trade
BrassCoders ships both the heuristic enrichment (OSS core) and the AI-powered enrichment (Paid plan) so the customer can pick the trade-off that fits. The OSS core is free, offline, deterministic, and ranks by severity plus deduplication signature. The Paid plan adds embedding-based dedup and project-signature reranking for $12/dev/month.
When the OSS heuristic is enough. Small codebases where the raw finding count is already manageable (200-400 findings) do not need AI-powered enrichment. The severity sort plus deduplication is sufficient triage; a developer reads through the list in an afternoon. Teams running BrassCoders for the first time on a small project often stay on the OSS core indefinitely.
When the AI-powered tier earns its $12. Larger codebases produce raw counts in the 1,000-3,000 range. The heuristic enrichment reduces to maybe 600-800 findings — still unreadable. The AI-powered enrichment reduces further, to the 30-ish actionable count. The $12 buys back ~20 hours of triage time per month per developer. The math is positive at any team size above two engineers shipping AI-generated code regularly.
The Paid plan is also the one with the project-signature reranking. The OSS core does not have signature awareness; it cannot tell that a finding in your test directory matters less than a finding in your main request handler. The Paid plan does, and that distinction is where the 30-finding output lives.
How To Try It
BrassCoders OSS core is installable with one command and runnable against any project with zero outbound network calls. The Paid plan activates with a license key after subscription; the same brasscoders scan command produces the AI-enriched output once the license is active.
OSS core:
pipx install brasscoders
brasscoders scan /path/to/project
cat /path/to/project/.brass/ai_instructions.yaml
The output shows the heuristic-ranked finding list. Read it. If the count is manageable and the rankings are useful, the OSS core is the right tier.
Paid plan:
brasscoders activate <license-key>
brasscoders scan /path/to/project
Same command. Different output. The ai_instructions.yaml now contains the AI-enriched ranking, the dedup-collapsed finding set, and the project-signature relevance scoring. The hand-off prompt to your AI assistant stays the same: “Read .brass/ai_instructions.yaml. Address the critical_issues in order.”
Subscribe at coppersun.dev/pricing. Cancel any time. The OSS core stays free either way; the Paid plan is the ranking layer that converts 1,500 findings into 30 actionable ones.
Frequently Asked Questions
What's the difference between heuristic and AI-powered enrichment?
Heuristic enrichment in the OSS core deduplicates findings by source-and-sink signature and ranks by raw severity. AI-powered enrichment in the Paid plan uses semantic embeddings to deduplicate findings that name the same root cause across different files, then reranks by relevance to a project signature derived from your README and manifest. The result is a list ranked by what matters to your codebase, not what matters in general.
Does the Paid plan ever drop a critical finding?
No. The Paid pipeline has a CRITICAL-exemption built into the enrichment layer. Any finding marked critical_issues survives every dedup and rerank step, by design. The exemption is explicit and tested. The intent: AI-powered ranking saves time on the medium-severity tail, never on the criticals.
How is reranking different from sorting by severity?
Severity sort puts every CRITICAL above every HIGH above every MEDIUM. It ignores whether the finding is relevant to your codebase. A CRITICAL finding in a test file ranks above a HIGH finding in production code. Reranking by project signature inverts that: production-path findings score higher than test-path findings of equal severity, because the project signature emphasizes the codebase's real entry points.
Can I see the raw findings if I want them?
Yes. BrassCoders writes detailed_analysis.yaml alongside ai_instructions.yaml. The detailed file contains every raw finding from every scanner. The AI-readable file contains the ranked critical_issues plus a finding_summary section. Both files persist after the scan completes.
How does the project signature work?
BrassCoders derives a ≤7,500-character project signature from the README, manifest (pyproject.toml, package.json), entrypoint, and top-level filenames. The signature is sent to the gateway, which passes it to a hosted embedding model alongside the scanner findings. The embedding compares each finding to the signature and produces a relevance score. The raw source code never leaves the machine.