Where's the Fair Use Line in AI Training?

Disclaimer: This is not legal advice. If you're planning something serious — find a lawyer. This is just my summary of what's happening in US courts.

2025: the first real decisions

For the past two years, AI companies said "fair use protects us." Publishers and creators said "this is mass theft." In June 2025 California courts finally spoke.

Case	Result	Why
Bartz v. Anthropic 23 Jun 2025	Fair use ✓	Training = "spectacularly transformative", no replication in outputs
Kadrey v. Meta 25 Jun 2025	Fair use ✓	Authors failed to prove market harm, despite pirated sources
Thomson Reuters v. ROSS Feb 2025	NOT fair use ✗	Products competed directly — ROSS was a substitute for Westlaw

What fair use is (for normal people)

Fair use is a US doctrine that says: "okay, you can use someone else's work without permission, if..."

Courts look at 4 things:

Transformativeness — are you doing something new, or copying 1:1?
Nature of the work — facts vs. creative works (like novels)
How much you copied — the whole thing or a fragment?
Market harm — did you kill the author's sales?

In the AI context: point 1 (transformativeness) and point 4 (harm) matter most.

Bartz v. Anthropic: "spectacularly transformative"

Authors sued Anthropic (makers of Claude) because the company used 7+ million books for training. Some were pirated copies.

Ruling: Judge Alsup found that training an AI model is "spectacularly transformative" — fundamentally different from reading or copying a book.

Key quotes from the ruling:

"Training a model to generate language is fundamentally different from reading a book"
"It's like teaching children to write — you can't forbid using works for learning"
"No evidence was shown that Claude replicates or substitutes the original works"

BUT: The judge allowed continuation of "piracy" claims — keeping illegal copies and generating literal passages through the chatbot is a separate matter.

Kadrey v. Meta: when you can't prove harm

Meta trained LLaMA on books from "shadow libraries" (pirated repos). Authors including Sarah Silverman sued.

Ruling: Judge Chhabria also found fair use, but was more cautious than Alsup.

The difference in reasoning:

Chhabria did NOT hold that transformativeness automatically equals fair use
He highlighted that LLMs can generate "millions of derivative works" in a fraction of the time
"Market dilution" could tip future cases against AI

But: The authors lost because they provided no evidence of market harm. The mere fact that Meta used pirated sources wasn't enough.

Thomson Reuters v. ROSS: when fair use does NOT apply

ROSS Intelligence built a competitor to Westlaw (Thomson Reuters' legal database). They used Westlaw data for training.

Ruling: This is NOT fair use. The first serious loss for an AI company.

Why they lost:

Direct competition — the end product replaced Westlaw
ROSS wasn't "generative" — it returned existing case summaries
Points 1 (transformativeness) and 4 (harm) both went against them

Lesson: If you're building a substitute for the source product, fair use likely won't protect you.

NYT vs OpenAI: the case still ongoing

The New York Times sued OpenAI in December 2023. The case continues after the motion to dismiss was rejected in March 2025.

NYT's claims:

OpenAI used millions of articles without permission or payment
ChatGPT can generate near-verbatim reproductions of NYT articles
Chatbots substitute for visits to nytimes.com (market harm)

Controversy: In May 2025, the court ordered OpenAI to preserve all user conversation logs — touching the privacy of 400 million people.

This case could reach the Supreme Court. It will be pivotal for the entire industry.

What about images? Getty vs Stability AI

Getty Images sued Stability AI (Stable Diffusion) in the UK for using millions of images for training.

June 2025: Getty withdrew its main copyright infringement claims.

Why?

Evidentiary problems — difficulty proving training took place in the UK
No witnesses — no one from Stability could describe the full process
Late amendments — Getty didn't update its claim in time

Takeaway: Even large companies struggle to prove AI infringement. Documentation is key.

Music: Concord Music vs Anthropic

Music publishers (Universal, Concord, ABKCO) sued Anthropic for using song lyrics in Claude.

March 2025: Partial settlement — Anthropic maintains guardrails against generating lyrics.

The case about using lyrics for training continues. The court rejected the motion for a preliminary injunction.

Japan: "paradise for machine learning"

For contrast: Japan adopted one of the most liberal AI systems in 2019.

Article 30-4 of Japan's Copyright Act:

Training AI = OK without rights holder permission
Applies to commercial use too
Covers materials from illegal sources (though technically prohibited)
Exception: when the purpose is "enjoyment" or "unjust harm"

Philosophy: Protection on outputs, not inputs. If the output harms the creator — then normal copyright rules kick in.

Where's the line? Practical boundaries

✅ Probably fair use:

High transformativeness — training to create new kinds of content
No replication — the model doesn't reproduce source material verbatim
Legally obtained copies (though not sufficient on its own)
No proven market harm — burden is on the plaintiffs

❌ Probably NOT fair use:

Libraries of pirated copies — even if not all are used
Output replicating works — verbatim quotes or substitutional replacements
Direct competition — end product replaces the original
Deliberate "style theft" — training on a small dataset of one artist's works

What this means for you (using Midjourney/SD)

Short answer: If you use Midjourney or Stable Diffusion for your own projects — you're fine for now. It's Midjourney/Stability that face the legal exposure, not you.

When you might have a problem:

You try to reproduce a specific named artist (e.g. "in style of Greg Rutkowski")
You generate something that replicates a protected work
You do this commercially at scale

How to protect yourself:

Avoid prompts with living artists' names
If something looks like a 1:1 copy — don't publish it
Add your own value (edit, compositing, context)

Practical takeaways for AI companies

1. Documentation is critical

Keep detailed records of data sources
Document where training takes place (jurisdiction)
Preserve proof of legal acquisition

2. Output quality matters

Implement effective guardrails against replication
Monitor for verbatim quote cases
Test whether the model can reproduce protected works

3. Market analysis

Assess whether the end product competes with source works
Document transformativeness
Be ready to demonstrate absence of market harm

What comes next?

Legal uncertainty continues:

Appeals are inevitable — the Anthropic and Meta cases will likely go higher
Supreme Court may weigh in — experts expect a final resolution there
Divergence between courts — individual judges see it differently
No uniform standards — every case is assessed individually

Key narrative shift: A year ago, AI companies were confident fair use protected them. Today they know it depends on implementation details and evidence.

Checklist: how to stay protected

For content creators:

☐ Add "no AI training" clauses to your terms of use
☐ Use technical protection measures (robots.txt, authentication)
☐ Monitor major AI models for your content
☐ Consider a licensing strategy

For AI companies:

☐ Audit training data sources
☐ Strengthen output guardrails
☐ Prepare a legal strategy (transformativeness, lack of harm)
☐ Consider licensing where risk is high

See original art made with Midjourney
Follow: @midjourneyartpl — effective use of AI.

FAQ

Can I use Midjourney commercially?

Yes, if you have a paid subscription. Midjourney faces the training problem, not you as a user. Just avoid replicating specific protected works.

What if I use a prompt "in style of [artist]"?

Style itself is not protected. But if the output looks like a copy of a specific work — that's a problem. The more you transform, the safer you are.

Should I be worried?

If you're an individual creator using AI for personal projects — no. If you're a company training your own models — yes, you need a lawyer.