Skip to main content

Apple Faces Lawsuit Over Alleged Use of Pirated Books for AI Training

Apple Accused of Using Pirated Books for AI Training

Two professors from the State University of New York (SUNY) College of Health Sciences have filed a class-action lawsuit against Apple Inc., alleging unauthorized use of their copyrighted works in training artificial intelligence systems. The complaint marks another escalation in the growing legal battles over AI training data sources.

The Allegations

Professors Susana Martinez-Conde and Stephen Macknik claim Apple used texts from Books3, a controversial dataset containing approximately 186,640 books sourced from pirated materials, to train its Apple Intelligence and OpenELM language models. Their books Champions of Illusion and Sleights of Mind were allegedly included without permission.

Image

The lawsuit asserts Apple not only used the materials for model training but also employed them to test performance and filter copyrighted content from user-facing outputs. This follows Apple's April 2024 admission that it utilized The Pile dataset, which incorporated Books3 content.

Background on Books3

Books3 operated as a shadow library, obtaining materials primarily through the private BitTorrent tracker Bibliotik. The collection gained notoriety among AI researchers before being taken down in October 2023 following copyright complaints.

The dataset became particularly controversial because:

  • It contained clearly copyrighted material
  • Was widely distributed among tech companies
  • Lacked proper attribution or compensation mechanisms

The case presents complex questions about:

  1. Whether AI training constitutes fair use
  2. How to compensate creators when works are used algorithmically
  3. What constitutes willful infringement in machine learning contexts

The plaintiffs seek:

  • A jury trial
  • Financial compensation
  • An injunction preventing future use of their works If found guilty of willful infringement, Apple could face penalties up to $150,000 per infringed work.

The lawsuit arrives amid growing scrutiny of tech companies' data practices:

"This isn't just about compensation - it's about establishing ethical boundaries for how creative works are used in the AI era," said intellectual property attorney Mark Lemley.

The case follows similar disputes involving Midjourney and Anthropic, where courts have struggled with applying traditional copyright frameworks to AI development.

Market Context

While the complaint notes Apple's market value increased $200 billion following its AI announcement, analysts caution against attributing this solely to disputed training methods:

  • Apple's valuation grew consistently over five years
  • Multiple factors influence stock performance
  • Actual impact remains unclear pending legal outcomes

The company has not yet issued substantive responses to the allegations.

Key Points:

  • Legal action: SUNY professors allege unauthorized use of their books in Apple's AI training
  • Controversial source: Books3 dataset contained pirated materials before takedown
  • High stakes: Potential penalties could reach $150k per infringed work
  • Broader implications: Case tests copyright boundaries in AI development

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Tech Giants Face Legal Heat Over YouTube Data Scraping Allegations

Apple, Amazon, and OpenAI find themselves in hot water as three YouTube creators file a class-action lawsuit accusing them of illegally scraping video data to train AI models. The case centers on the controversial Panda-70M dataset, which allegedly bypassed YouTube's copyright protections. With demands for maximum statutory damages and an immediate halt to using the data, this lawsuit could set important precedents for AI development and creator rights in the digital age.

April 7, 2026
AI EthicsCopyright LawTech Lawsuits
News

Inside OpenAI's Controversial Plan to Spark an AI Arms Race

Leaked discussions reveal OpenAI once considered stoking geopolitical tensions to secure government funding, drawing comparisons to a Call of Duty villain's tactics. The proposed strategy - creating an artificial 'prisoner's dilemma' between nations - sparked internal outrage before being abandoned. While the company dismisses the claims as absurd, the revelation raises tough questions about ethics in the race for artificial general intelligence.

April 7, 2026
OpenAIAI EthicsGeopolitics
Record Labels and AI Startup Suno Clash Over Music Copyrights
News

Record Labels and AI Startup Suno Clash Over Music Copyrights

Major record labels and AI music startup Suno have reached an impasse in negotiations over copyright protections for AI-generated music. The standoff highlights growing tensions between traditional music industry players and tech innovators as artificial intelligence reshapes creative fields. While labels want strong safeguards for artists, Suno seeks more flexible licensing terms to develop its AI composition tools - leaving both sides at loggerheads over how to balance innovation with copyright protection in this new era of machine-made music.

April 7, 2026
AI MusicCopyright LawMusic Industry
News

Germans Sound Alarm on Deepfake Dangers as Concerns Top 90%

A new survey reveals overwhelming German anxiety about AI-generated deepfakes, with 91% expressing concern. The Dimap poll shows particular worry about fake news detection and job displacement, while opinions split on AI's future impact. Younger Germans remain more optimistic as voice cloning scams spread globally, with one in four Americans already encountering deceptive AI calls.

April 2, 2026
AI EthicsDeepfake TechnologyDigital Security
OpenAI pulls plug on ChatGPT adult mode and Sora video tool in strategic pivot
News

OpenAI pulls plug on ChatGPT adult mode and Sora video tool in strategic pivot

OpenAI has abruptly halted plans for a controversial 'adult mode' in ChatGPT and shut down its Sora video generation model. The moves come as part of a broader strategic shift away from consumer-facing projects toward enterprise solutions. Industry analysts suggest the company is responding to competitive pressure from Anthropic's growing foothold in business applications.

March 27, 2026
OpenAIChatGPTAI Ethics
News

NVIDIA Chief Warns Against AI Fearmongering as Industry Tensions Rise

NVIDIA CEO Jensen Huang has called for measured discussions about AI risks at the GTC 2026 conference, warning against panic that could stifle innovation. His comments come amid growing tensions between AI firm Anthropic and the U.S. government over ethical concerns. Huang maintains that AI is fundamentally just software, while advocating for diversified chip supply chains to ensure technological resilience.

March 20, 2026
AI EthicsTech LeadershipSemiconductor Industry