The Claude Code Leak Exposed Three Copyright Assumptions Eve

Q: How does Anthropic's fair use defense conflict with its DMCA takedowns?

Anthropic successfully argued in *Bartz v. Anthropic* that training AI on copyrighted books is "exceedingly transformative" fair use. But when OpenAI's Codex was used to rewrite Claude Code's source into Python, Anthropic filed DMCA takedowns. The tension: the stronger the "training is fair use" precedent becomes, the harder it is to prevent competitors from applying the same argument to AI-assisted rewrites of Anthropic's own code.

Q: What is the Doe v. GitHub Copilot case and why does it matter?

*Doe v. GitHub* is a class action alleging that GitHub Copilot strips license information from open-source training data, violating the DMCA. After most claims were dismissed in 2024, breach of contract and license violation counts survived. Oral arguments were held at the Ninth Circuit in February 2026 with no ruling yet. The outcome will shape whether AI coding tools can legally train on and reproduce open-source code.

Q: What does undercover.ts reveal about AI attribution in code?

The leaked Claude Code source contained a file called `undercover.ts` that activates a stealth mode for Anthropic employees contributing to external repositories. It instructs the AI to remove all Anthropic-internal references from commits and pull requests. While operationally defensible, it systematically removes evidence of AI involvement — the type of evidence courts would need to assess human versus AI contribution for copyright purposes.

On March 31, 2026, Anthropic shipped Claude Code v2.1.88 to npm with a debug artifact still attached — a 59.8 MB source map file pointing to a zip archive on their Cloudflare R2 bucket. Inside: roughly 2,000 TypeScript files and 512,000 lines of proprietary source code. A security researcher flagged it on X. The post hit 28.8 million views. Within hours, the leaked codebase had accumulated over 84,000 stars and 82,000 forks on GitHub.

Anthropic called it a “release packaging issue caused by human error” and filed DMCA takedown notices. GitHub disabled an entire fork network of 8,100+ repositories — then had to walk it back after developers pointed out that many were legitimate forks of Anthropic’s own public Claude Code repo that had nothing to do with the leak. Boris Cherny, Anthropic’s head of Claude Code, acknowledged the overbroad takedowns were unintentional.

That would be a standard leak-and-cleanup story. What makes this one worth pulling apart is what happened next — and the three assumptions about code ownership it shattered.

A Two-Hour Rewrite That Broke the Framework

Developer Sigrid Jin took the leaked architecture and used OpenAI’s Codex to rewrite the entire project from TypeScript to Python. The result — Claw Code — hit 50,000 GitHub stars in two hours, one of the fastest accumulation rates the platform has recorded.

Anthropic’s DMCA takedowns targeted the leaked source. But Claw Code isn’t the leaked source. It’s a Python reimplementation of the same architecture, generated by a competing AI model. And that raises a question traditional copyright law wasn’t built to answer: when an AI rewrites proprietary code in a different language, who owns what?

Three layers of legal uncertainty stack on top of each other here — and every developer using AI to write code is implicitly betting on all three.

Layer 1: Your AI-Generated Code Might Not Be Yours

Most developers operate with a simple mental model of code ownership: I wrote it, I own it. The company I work for owns what I write at work. Open source has licenses. The authorship question is usually clear.

AI complicates this. In March 2026, the U.S. Supreme Court declined to hear the Thaler v. Perlmutter appeal, effectively confirming the lower court ruling: purely AI-generated works don’t qualify for copyright protection. The Copyright Office has clarified that protection only covers the portions of a work reflecting human creative contribution — and applicants must disclose AI involvement.

Now consider something Boris Cherny said publicly last year: that his contributions to Claude Code over a 30-day period were “100% written by Claude Code itself” and that he “didn’t even make small edits.” He was demonstrating product capability. But in a copyright framework, that statement matters. If the code’s expression was entirely determined by AI with no human creative control over the specific output, the Thaler principle suggests it may not be copyrightable.

Anthropic can argue that human creativity shows up in architecture decisions, prompt engineering, and output selection. The Copyright Office has acknowledged this reasoning. The problem is where the line falls. Writing a prompt that produces 500 lines of code and picking the best version — does that constitute creative control? What about three rounds of revision? What about committing on first glance? No case law draws this boundary yet.

A file in the leaked source made this tension more pointed. undercover.ts — roughly 90 lines — implements a stealth mode activated when USER_TYPE === 'ant' (Anthropic employees). It injects a system prompt instructing the AI to strip all Anthropic-internal identifiers from git commits and pull requests when employees contribute to external open-source repositories. Internal codenames, Slack channels, references to Claude Code itself — all removed. For external users, the entire function is dead-code-eliminated.

The practical purpose is operational security. But the side effect is systematic removal of evidence that AI participated in code creation — the exact kind of evidence a court would need to assess how much of a codebase reflects human versus AI contribution.

Layer 2: AI Broke the Clean-Room Defense

The traditional way to legally replicate a competitor’s functionality is a clean-room rewrite. One team reads the original and writes a functional specification. A separate team — who has never seen the original code — implements from the spec alone. The legal logic is straightforward: if the second team never touched the original, any similarity in the output is coincidental or functionally necessary. Sega v. Accolade established this as strong evidence of fair use.

Claw Code didn’t attempt a clean-room process — Codex was pointed at the leaked source and told to translate. But even a rigorous clean-room attempt with AI fails for a more fundamental reason: models like Codex are trained on massive code corpora that almost certainly include publicly available portions of Claude Code’s codebase. The “clean” reimplementer has, by definition, already been exposed to the original during training. You can isolate the human team perfectly, but you can’t isolate the AI.

The chardet controversy illustrates this clearly. In March 2026, maintainer Dan Blanchard used Claude to rewrite the Python character-detection library (130+ million monthly downloads) from LGPL to MIT license. He instructed Claude not to reference LGPL source and started from an empty repository. Plagiarism detection showed less than 1.3% similarity with prior versions, and the rewrite achieved a 48x speed improvement.

The open-source community split. Mark Pilgrim, chardet’s original creator, filed an issue arguing that a decade of familiarity with the codebase plus an AI model trained on that codebase does not constitute a clean room. Richard Fontana, who co-authored GPLv3, took the opposite view — he found no basis for requiring the rewrite to remain under LGPL. Zoë Kooyman, executive director of the Free Software Foundation, was direct: a model that has already ingested the target code cannot produce a “clean” rewrite.

The core issue isn’t whether any specific rewrite is infringing. It’s that the entire concept of isolation — the foundation of clean-room legality — doesn’t map onto how AI models work. Traditional copyright assumes copying is a discrete event: either the reimplementer saw the original or they didn’t. An AI model trained on public code turns that binary into a spectrum. It “saw” the code during training, but that doesn’t mean it “remembers” it, and remembering doesn’t guarantee reproduction. You also can’t prove it didn’t.

Layer 3: Anthropic Needs Both Sides of the Same Argument

This is the layer with the sharpest edges. Anthropic is simultaneously arguing two positions that exist in tension with each other.

In June 2025, a U.S. District Court ruled in Bartz v. Anthropic that using copyrighted books to train Claude was “exceedingly transformative” fair use — the input is books, the output is a chat model, and they don’t compete in the same market. That ruling matters enormously for every AI company, because it provides a legal foundation for training on copyrighted data.

But when Codex took Claude Code’s source and generated a functionally equivalent Python version, Anthropic responded with DMCA takedowns. The underlying logic of the two positions collides: if “AI learning from copyrighted works to produce new output” is transformative fair use (Anthropic’s training-data defense), then “AI learning from Claude Code to produce a Python rewrite” should follow the same principle.

There are legitimate distinctions. Bartz involved training a general model on millions of books — high transformativeness, no direct market substitution. Claw Code is a targeted rewrite of a specific product that directly competes with the original. Under the four-factor fair use test, market impact is typically the most decisive consideration, and on that axis the two cases look quite different.

But the tension doesn’t resolve cleanly. The stronger Anthropic’s “training is transformative use” argument becomes as legal precedent, the harder it gets to prevent others from applying the same logic to AI-assisted rewrites of Anthropic’s own products. The legal moat you build for yourself is simultaneously a road your competitors can use.

One case to watch: Doe v. GitHub, the Copilot class action, had oral arguments before the Ninth Circuit in February 2026 and hasn’t been decided yet. The central question — whether AI coding tools that strip license information from training data violate the DMCA — will directly affect how all three layers shake out.

What This Means If You Ship Code

The three layers share a pattern: foundational copyright concepts — authorship, copying, transformative use — lose their clear boundaries when AI is involved. The legal system hasn’t caught up. The risk sits with practitioners.

If your product codebase is substantially AI-generated, the “all code is proprietary property of this company” clause in your contracts may rest on untested legal ground. Design documents, code review records, and architecture decision logs are no longer just engineering best practices — they’re potential evidence of human creative control.

If you rely on closed source as a competitive moat, the Claude Code incident demonstrates that AI tools can produce functionally equivalent alternatives from public descriptions alone, even without a leak. Product defensibility needs to rest on dimensions harder to replicate: data flywheels, user ecosystems, integration depth, distribution.

If you maintain open-source projects, the chardet incident is a preview. AI-assisted rewrite-and-relicense is neither clearly prohibited nor clearly permitted. Tracking Doe v. GitHub and the chardet dispute matters more than any technical trend prediction right now.

Anthropic just happened to be the first company pushed under the spotlight. Every developer writing code with AI is making the same unverified legal bets, every day.

FAQ

Who owns code written by AI tools like Claude Code or GitHub Copilot?

Under current U.S. law, purely AI-generated code without meaningful human creative contribution may not qualify for copyright protection. The Supreme Court’s 2026 refusal to hear Thaler v. Perlmutter confirmed this principle. Code qualifies for protection when humans exercise creative control over the output — through architecture decisions, iterative revision, and selection — but the exact threshold remains undefined by case law.

What happened in the Claude Code source code leak?

On March 31, 2026, Anthropic accidentally published a debug source map file in Claude Code v2.1.88 on npm, exposing roughly 512,000 lines of proprietary TypeScript across 2,000 files. The leaked code accumulated 84,000+ GitHub stars before Anthropic filed DMCA takedowns. An overbroad takedown initially disabled 8,100+ repositories, including legitimate forks unrelated to the leak.

Can AI legally rewrite proprietary code in a different programming language?

No court has ruled definitively on this question. Traditional clean-room reverse engineering provides legal cover when the reimplementer never saw the original code. But AI models trained on public code repositories have already been exposed to vast amounts of source code during training, undermining the isolation principle that makes clean-room defenses work. The legality depends on whether the output constitutes a protected “expression” or merely replicates unprotectable “function.”

What is the chardet AI license controversy?

In March 2026, chardet’s maintainer used Claude to rewrite the Python library from LGPL to MIT license, claiming a clean-room process. The open-source community split: some argued an AI trained on the original codebase cannot produce a “clean” rewrite, while others found no legal basis requiring the rewrite to remain under LGPL. No court has ruled on whether AI-assisted relicensing is valid.

How does Anthropic’s fair use defense conflict with its DMCA takedowns?

Anthropic successfully argued in Bartz v. Anthropic that training AI on copyrighted books is “exceedingly transformative” fair use. But when OpenAI’s Codex was used to rewrite Claude Code’s source into Python, Anthropic filed DMCA takedowns. The tension: the stronger the “training is fair use” precedent becomes, the harder it is to prevent competitors from applying the same argument to AI-assisted rewrites of Anthropic’s own code.

What is the Doe v. GitHub Copilot case and why does it matter?

Doe v. GitHub is a class action alleging that GitHub Copilot strips license information from open-source training data, violating the DMCA. After most claims were dismissed in 2024, breach of contract and license violation counts survived. Oral arguments were held at the Ninth Circuit in February 2026 with no ruling yet. The outcome will shape whether AI coding tools can legally train on and reproduce open-source code.

How should companies protect AI-generated code ownership?

Maintain evidence of human creative involvement: architecture design documents, code review records, decision logs, and iterative revision history. These materials demonstrate the human contribution that copyright law requires. Relying solely on the “we wrote the prompts” argument may not be sufficient — courts will evaluate whether humans exercised creative control over the specific expression of the code, not just the high-level intent.

What does undercover.ts reveal about AI attribution in code?

The leaked Claude Code source contained a file called undercover.ts that activates a stealth mode for Anthropic employees contributing to external repositories. It instructs the AI to remove all Anthropic-internal references from commits and pull requests. While operationally defensible, it systematically removes evidence of AI involvement — the type of evidence courts would need to assess human versus AI contribution for copyright purposes.

References

Kilo.ai (2026). Claude Code Source Leak: A Timeline. https://blog.kilo.ai/p/claude-code-source-leak-a-timeline
Ars Technica (2026). Anthropic says its leak-focused DMCA effort unintentionally hit legit GitHub forks. https://arstechnica.com/ai/2026/04/anthropic-says-its-leak-focused-dmca-effort-unintentionally-hit-legit-github-forks/
Decrypt (2026). Anthropic Accidentally Leaked Claude Code Source. The Internet Is Keeping It Forever. https://decrypt.co/362917/anthropic-accidentally-leaked-claude-code-source-internet-keeping-forever
Copyright Lately (2026). Thaler Is Dead: AI Copyright Questions Live On. https://copyrightlately.com/thaler-is-dead-ai-copyright-questions/
Congressional Research Service (2026). Generative AI and Copyright Law. https://www.congress.gov/crs-product/LSB10922