What Claude Code's Leaked Source Reveals About Defending AI

When Anthropic accidentally shipped Claude Code’s full TypeScript source via npm on March 31, 2026, the engineering community didn’t just find code. They found a six-layer defense system designed to answer a question every AI company faces: how do you stop someone from cloning your product, distilling your model, or impersonating your client?

The answer, it turns out, is not one mechanism but six — layered from compile time to runtime, from the JavaScript engine down to native Zig code, each operating independently so that breaking one doesn’t compromise the others. Security researchers and engineers have since dissected the architecture extensively. What emerges is a blueprint for how AI products will need to think about protection as the stakes of model distillation rise.

Layer 1: Code That Was Never There

The foundation of Claude Code’s defense is deceptively simple: compile-time dead code elimination. Anthropic maintains a single codebase that produces two different binaries — one for internal employees, one for the public.

The mechanism relies on process.env.USER_TYPE, injected at build time via Bun’s --define flag. Every branch that checks USER_TYPE === 'ant' gets evaluated at compile time. In the external build, those branches resolve to false and the bundler eliminates the entire code path — not just the condition, but every function and module only reachable through it.

The scale tells the story: USER_TYPE === 'ant' appears 357 times across 165 files. Internal engineers get debug tools, an undercover mode for contributing to open-source repos without attribution, a bash command classifier, and connector text summaries. External users get none of it — not because it’s hidden behind a flag, but because it physically doesn’t exist in their binary.

Bun provides a second primitive: feature() from bun:bundle. Flags like ANTI_DISTILLATION_CC and NATIVE_CLIENT_ATTESTATION are replaced with boolean constants at build time, giving fine-grained control over which defense layers ship in which build variant. And a CI pipeline scans the final external binary against a list of internal codenames (like Capybara and Tengu) — any residual string fails the build.

The design philosophy is clear: compile-time elimination is more reliable than runtime checks. You can patch a runtime flag. You can’t patch code that was never compiled into the binary in the first place.

Layer 2: Zig-Level Client Attestation

This is the most architecturally interesting layer, and it depends on a technical detail about how Claude Code is built.

Bun — the JavaScript runtime Claude Code uses — is written in Zig, a systems-level language in the same category as C and Rust. When Bun compiles Claude Code into a standalone binary (bun build --compile), the resulting executable includes Bun’s entire HTTP networking stack, which runs as native Zig code outside the JavaScript engine’s memory space.

Anthropic exploits this architecture for attestation. When Claude Code builds an API request, the JavaScript layer inserts a fixed-length placeholder (cch=00000) into the request body. Before the request actually hits the network, Bun’s native HTTP stack — running in Zig, below JavaScript — locates the placeholder in the serialized byte stream and overwrites it with a computed attestation hash. The server validates this hash to confirm the request came from a genuine Claude Code binary.

Several engineering details make this hard to circumvent. The placeholder and hash are the same length, so Content-Length doesn’t change and no buffer reallocation is needed. The token is embedded in the JSON body (not an HTTP header), so standard proxies and API gateways can’t observe or strip it at the transport layer. And critically, JavaScript-level interception — overwriting fetch, monkey-patching http.request, installing middleware — cannot see or modify what happens in the Zig layer.

The trade-off is explicit: Claude Code is permanently locked to Bun. Node.js and Deno can’t provide equivalent native HTTP stack injection. That’s a vendor lock-in decision made deliberately in exchange for a security property no other JavaScript runtime can offer.

Layer 3: Message Fingerprinting

Attestation proves the client is real. Fingerprinting detects whether the message content was tampered with in transit.

The algorithm is intentionally simple: take the 4th, 7th, and 20th characters of the first user message, concatenate them with a hardcoded salt and the client version, SHA-256 hash the result, and take the first 3 hex characters. That 3-character fingerprint (12 bits, 4,096 possible values) gets appended to the attribution header.

Twelve bits isn’t a cryptographic guarantee — it’s a statistical signal. The server can recompute the fingerprint from the message and version, and if it doesn’t match, the message was modified. Run this across thousands of requests from the same API key and you get a clear picture: a high mismatch rate means traffic is flowing through a proxy that’s injecting system prompts or rewriting user messages.

A comment in the source reveals one constraint: the fingerprint algorithm must stay synchronized across first-party and third-party APIs (Bedrock, Vertex, Azure). That cross-team coordination requirement explains the simplicity — a more complex algorithm would be harder to keep consistent across different language implementations and partner platforms.

Layer 4: Anti-Distillation

The first three layers verify identity and integrity. Layer 4 addresses a different threat: even legitimate users might be recording Claude’s outputs to train a competing model.

Two mechanisms work in tandem — one poisoning the input, one watermarking the output.

On the input side, when conditions are met, Claude Code adds anti_distillation: ['fake_tools'] to the API request, telling the server to inject fake tool definitions into the system prompt. These look like real tool specifications but describe capabilities that don’t exist. Anyone recording request-response pairs for training data gets a poisoned dataset — the distilled model will try to invoke tools that were never real.

The trigger has three independent gates: a compile-time feature flag (ANTI_DISTILLATION_CC), an entrypoint check (CLI only, not SDK), and a GrowthBook remote feature flag (tengu_anti_distill_fake_tool_injection). Three gates means Anthropic can enable or disable this server-side without shipping a new client version.

On the output side, Claude’s thinking blocks and connector text carry cryptographic signatures bound to the API key and model that generated them. These signatures create three constraints. Switch API keys (e.g., after /login), and old thinking blocks become invalid — the client strips them. Fall back from one model to another under load, and thinking blocks from model A fail validation when sent to model B. And for anyone systematically collecting training data, the signatures create an audit trail tying every thinking block to its origin API key.

Both mechanisms only activate for first-party CLI sessions. Third-party users on Bedrock or Vertex are already authenticated through their cloud provider — the distillation risk concentrates on direct API access.

Layer 5: Anti-Debug and Token Protection

This layer defends against a subtler attack vector: prompt injection that tricks the model into running shell commands like gdb -p $PPID to scrape API tokens from process memory.

Claude Code uses Bun’s FFI to call Linux’s prctl system call with PR_SET_DUMPABLE = 0, making the process immune to ptrace attachment from same-UID processes. Even if an attacker can execute code in the same user context, they can’t attach a debugger to read heap memory.

Token lifecycle management adds another layer. In Claude Code Remote (the cloud container mode), the session token is read from a file, then the file is immediately deleted — the token exists only in memory, with no filesystem artifact. The deletion timing is deliberate: it happens only after the relay successfully starts, so a failed startup can retry with the token still on disk.

The entire layer follows a fail-open principle: if prctl isn’t available (non-Linux platforms), it silently skips. If the token file can’t be deleted, it logs a warning and continues. Security mechanisms never block the core coding workflow — a design choice that reflects Claude Code’s identity as a productivity tool first, not a security product.

Layer 6: Gateway Detection

The outermost layer is pure reconnaissance. After every API response, Claude Code scans HTTP response headers for fingerprints of known AI proxy gateways — LiteLLM, Helicone, Portkey, Cloudflare AI Gateway, Kong, and Braintrust each leave characteristic header prefixes. For SaaS gateways like Databricks, it checks the API base URL’s domain suffix.

Detection results go to telemetry. The source shows no blocking logic — this is monitoring, not enforcement. But the strategic purpose is clear: AI proxy gateways are the infrastructure layer that turns casual API usage into systematic distillation. They enable recording request-response pairs at scale, distributing load across multiple API keys, and injecting or filtering content in transit. By identifying gateway traffic, Anthropic builds the data foundation for targeted server-side countermeasures without tipping off the gateway operator.

This embodies the principle that runs through all six layers: the client tags, the server decides. Every client-side mechanism attaches information to requests — attestation hashes, fingerprints, gateway markers. The final trust judgment always happens server-side, where logic can be updated without shipping a new binary.

What Builders Should Take From This

The architecture reveals five principles worth internalizing if you’re building AI products.

First, compile-time beats runtime for anything you want to keep private. Runtime flags can be patched. Code that never made it into the binary can’t be reverse-engineered from the binary.

Second, exploit your runtime’s unique properties. Anthropic chose Bun specifically because its Zig-based HTTP stack enables attestation below the JavaScript layer. The right runtime choice isn’t just about performance — it’s about what security primitives your architecture makes possible.

Third, design for adversary economics, not absolute security. No single layer is unbreakable. But an attacker who needs to defeat compile-time elimination, Zig attestation, message fingerprinting, anti-distillation, anti-debug, and gateway detection simultaneously faces compounding costs. The goal is making your product a harder target than the alternatives.

Fourth, never let security mechanisms degrade the user experience. Every layer in Claude Code fails open. A broken attestation check doesn’t block your coding session. This is easy to state and hard to maintain as security layers accumulate — but the moment a protection mechanism causes a user-visible failure, it’s creating negative value.

Fifth, separate tagging from enforcement. Client-side code in the wild will be reverse-engineered. If your client makes blocking decisions, attackers know exactly what to patch. If your client only tags and your server decides, you can change enforcement logic at any time without a client update.

The irony is that the leak itself served as a stress test. Even with the complete source code exposed, the server-side attestation verification and GrowthBook feature flags remain the final line — exactly as the architecture intended.

FAQ

What security mechanisms does Claude Code use to prevent cloning?

Claude Code uses a six-layer defense-in-depth system: compile-time dead code elimination that removes internal features from public builds, Zig-level cryptographic client attestation below the JavaScript engine, message fingerprinting for tamper detection, anti-distillation mechanisms that poison training data with fake tool definitions, anti-debug protections that block memory inspection, and gateway detection that identifies proxy infrastructure used for systematic data collection.

How does Claude Code’s Zig attestation work?

Claude Code is built on Bun, a JavaScript runtime written in Zig. When making API requests, the JavaScript layer inserts a placeholder token in the request body. Before the request hits the network, Bun’s native Zig HTTP stack — running outside JavaScript’s memory space — overwrites the placeholder with a computed attestation hash. The server validates this hash to confirm the request came from a genuine Claude Code binary, not a spoofed client.

What is AI model distillation and why do companies try to prevent it?

Model distillation is the process of training a smaller, cheaper model to replicate the behavior of a larger, more capable one by feeding it the larger model’s inputs and outputs as training data. Companies like Anthropic invest heavily in training frontier models, so competitors recording API traffic to build equivalent products at a fraction of the cost represents a direct business threat. Anti-distillation mechanisms like fake tool injection and thinking block signatures make collected training data unreliable or traceable.

What are anti-distillation fake tools in Claude Code?

When enabled, Claude Code instructs the API server to inject fake tool definitions — plausible-looking but non-functional tool specifications — into the system prompt. Anyone recording Claude Code’s API traffic for training data captures these fake definitions. A model trained on this poisoned data will attempt to invoke tools that don’t exist, degrading its reliability. The mechanism is controlled by three independent gates: a compile-time flag, an entrypoint check, and a server-side feature flag.

Why does Claude Code use Bun instead of Node.js?

Beyond performance benefits, Bun’s architecture enables a security property Node.js cannot provide. Because Bun is written in Zig, its HTTP networking stack runs as native code outside the JavaScript engine’s memory space. This allows Claude Code to inject attestation tokens at the network layer where JavaScript-level interception (overwriting fetch, monkey-patching http.request) cannot detect or modify them. This is an explicit vendor lock-in trade-off: security capability in exchange for runtime portability.

How does Claude Code detect AI proxy gateways?

Claude Code scans HTTP response headers after every API call, looking for characteristic prefixes from known gateway services: LiteLLM (x-litellm-), Helicone (helicone-), Portkey (x-portkey-), Cloudflare AI Gateway (cf-aig-), Kong (x-kong-), and Braintrust (x-bt-). For SaaS providers like Databricks, it checks the API base URL domain. Detection results are logged to telemetry for server-side analysis rather than triggering immediate blocking.

What happened when Claude Code’s source code leaked?

On March 31, 2026, Anthropic accidentally shipped a 59.8 MB source map file in Claude Code v2.1.88 on npm, exposing roughly 512,000 lines of TypeScript across 2,000 files. The leak revealed the complete defense architecture, internal codenames (Capybara, Tengu), and anti-distillation mechanisms. Anthropic’s DMCA response initially took down 8,100+ GitHub repositories before correcting the overbroad takedowns. The incident served as an unintended stress test of the defense system’s server-side resilience.

What is the fail-open principle in security design?

Fail-open means that when a security mechanism breaks, it degrades gracefully rather than blocking the user. Claude Code applies this consistently: if Zig attestation fails, the request still goes through. If anti-debug protections aren’t available on the current platform, they silently skip. If a gateway can’t be identified, the request proceeds normally. This reflects a design philosophy where security never interferes with the core product experience — critical for developer tools where workflow interruption has immediate productivity costs.

How do feature flags control Claude Code’s security remotely?

Anthropic uses GrowthBook feature flags to remotely toggle security mechanisms without shipping client updates. Flags like tengu_anti_distill_fake_tool_injection and tengu-off-switch allow the server to enable or disable anti-distillation, attribution headers, and other defenses in real time. This means even with the client source code fully exposed, Anthropic can change what defenses are active — the client tags requests with whatever the server tells it to, and enforcement logic stays server-side.

References

Yage.ai (2026). Claude Code’s Defenses: How It Prevents You From Impersonating It (Claude Code 的防线).
Alex Kim (2026). The Claude Code Source Leak: fake tools, frustration regexes, undercover mode, and more. https://alex000kim.com/posts/2026-03-31-claude-code-source-leak/
Sabrina (2026). Comprehensive Analysis of Claude Code Source Leak. https://www.sabrina.dev/p/claude-code-source-leak-analysis
Modem Guides (2026). Claude Code Leak: Anti-Distillation, KAIROS & Memory Architecture. https://www.modemguides.com/blogs/ai-news/claude-code-leak-architecture-analysis
WinBuzzer (2026). Claude Code Source Leak Exposes Anti-Distillation Traps. https://winbuzzer.com/2026/04/01/claude-code-source-leak-anti-distillation-traps-undercover-mode-xcxwbn/