Apple Faces Lawsuit Over Alleged Use of Pirated Books for AI Training
Apple Accused of Using Pirated Books for AI Training
Two professors from the State University of New York (SUNY) College of Health Sciences have filed a class-action lawsuit against Apple Inc., alleging unauthorized use of their copyrighted works in training artificial intelligence systems. The complaint marks another escalation in the growing legal battles over AI training data sources.
The Allegations
Professors Susana Martinez-Conde and Stephen Macknik claim Apple used texts from Books3, a controversial dataset containing approximately 186,640 books sourced from pirated materials, to train its Apple Intelligence and OpenELM language models. Their books Champions of Illusion and Sleights of Mind were allegedly included without permission.

The lawsuit asserts Apple not only used the materials for model training but also employed them to test performance and filter copyrighted content from user-facing outputs. This follows Apple's April 2024 admission that it utilized The Pile dataset, which incorporated Books3 content.
Background on Books3
Books3 operated as a shadow library, obtaining materials primarily through the private BitTorrent tracker Bibliotik. The collection gained notoriety among AI researchers before being taken down in October 2023 following copyright complaints.
The dataset became particularly controversial because:
- It contained clearly copyrighted material
- Was widely distributed among tech companies
- Lacked proper attribution or compensation mechanisms
Legal Implications
The case presents complex questions about:
- Whether AI training constitutes fair use
- How to compensate creators when works are used algorithmically
- What constitutes willful infringement in machine learning contexts
The plaintiffs seek:
- A jury trial
- Financial compensation
- An injunction preventing future use of their works If found guilty of willful infringement, Apple could face penalties up to $150,000 per infringed work.
The lawsuit arrives amid growing scrutiny of tech companies' data practices:
"This isn't just about compensation - it's about establishing ethical boundaries for how creative works are used in the AI era," said intellectual property attorney Mark Lemley.
The case follows similar disputes involving Midjourney and Anthropic, where courts have struggled with applying traditional copyright frameworks to AI development.
Market Context
While the complaint notes Apple's market value increased $200 billion following its AI announcement, analysts caution against attributing this solely to disputed training methods:
- Apple's valuation grew consistently over five years
- Multiple factors influence stock performance
- Actual impact remains unclear pending legal outcomes
The company has not yet issued substantive responses to the allegations.
Key Points:
- Legal action: SUNY professors allege unauthorized use of their books in Apple's AI training
- Controversial source: Books3 dataset contained pirated materials before takedown
- High stakes: Potential penalties could reach $150k per infringed work
- Broader implications: Case tests copyright boundaries in AI development