Question 1

How do AI coding assistants cause secret leakage?

Accepted Answer

Three mechanisms. First, AI assistants trained on public code have memorized credential patterns from leaked repositories — and can reproduce them verbatim when context triggers recall. Second, AI-generated code often hardcodes credentials for convenience, especially in configuration stubs and test scaffolding. Third, developers paste proprietary code or credentials into AI assistant prompts, which sends the data to third-party servers. BrassCoders detects the second mechanism (hardcoded credentials in source) at scan time.

Question 2

What happened at Samsung with AI assistant credential leakage?

Accepted Answer

In March 2023, Samsung engineers used ChatGPT to assist with code review and debugging. In the process, they pasted proprietary source code and internal meeting notes — including a database schema — into the ChatGPT interface. Samsung confirmed the incident and temporarily banned AI assistant use internally. The incident demonstrated that data shared with AI assistants travels to third-party servers, and that credential patterns in code are particularly sensitive.

Question 3

Can LLMs actually reproduce credentials they were trained on?

Accepted Answer

Yes. Carlini et al. (2021 and 2023) demonstrated that language models memorize and can reproduce verbatim sequences from their training data, including email addresses, phone numbers, and API keys from leaked GitHub repositories. The 2023 paper quantified the memorization rate across model sizes and showed it increases with model scale. Builders should treat AI-generated code with embedded secret-shaped strings as a red flag, not a false positive.

Question 4

What does BrassCoders detect in this category?

Accepted Answer

BrassCoders bundles detect-secrets (Yelp's OSS credential scanner) plus a custom secret-pattern scanner. Together they flag: high-entropy strings in source code, known credential prefixes (AWS access key IDs starting with AKIA, GitHub personal access tokens starting with ghp_, Stripe keys, Anthropic keys, and others), private key PEM blocks, and .env-style assignments in source files. Every flagged credential should be treated as compromised and rotated — even if it looks like a placeholder.

Question 5

What is GitGuardian's State of Secrets Sprawl?

Accepted Answer

GitGuardian's annual report is the most cited primary-source measurement of credential leakage in public repositories. The 2024 edition found over 12.8 million secrets exposed in public GitHub commits in a single year. It tracks trends by credential type (cloud keys, database URLs, API tokens), industry, and repository age. The findings are based on GitGuardian's real-time scanning of the public GitHub event stream.

Question 6

How do I prevent AI tools from sending credentials to third-party servers?

Accepted Answer

Four controls. First, never paste credentials, database URLs, or proprietary configuration into an AI assistant chat interface. Second, use the AI assistant's VS Code extension or IDE integration rather than the web interface when working with sensitive code — some configurations offer local context modes. Third, add .env files and secrets managers to your .gitignore and never hardcode secrets in source. Fourth, run BrassCoders scan on every commit to catch hardcoded credentials before they reach a shared repository.

Will My AI-Generated Code Leak My Credentials?

Secret Leakage Is The Highest-Frequency AI-Coder Bug Category

📊 GitGuardian — State of Secrets Sprawl 2024

🏢 Samsung / ChatGPT Credential Disclosure (2023)

📄 Carlini et al. — Quantifying Memorization Across Neural Language Models

🔧 detect-secrets (Yelp)

Frequently Asked Questions