- Hachette Book Group and Cengage Group moved to join a California federal class action accusing Google of using copyrighted books to train its Gemini models.
- The publishers say Google downloaded books from pirate sites such as Z‑Library, OceanofPDF and WeLib and repeatedly copied them into training systems.
- The suit alleges Google’s C4 dataset drew from at least 28 piracy-linked sites and that the copyright symbol appears over 200 million times in C4.
- The publishers request statutory damages, injunctions, destruction of unauthorized copies and disclosure of which books trained Gemini.
- The complaint and the consolidated 2023 class action docket are publicly available as linked documents supporting the motion.
On Thursday, major publishers Hachette Book Group and Cengage Group filed a motion to intervene in a federal class action in California that accuses Google of copying books to train its Gemini AI models. The publishers attached a formal complaint that lays out their claims and seeks relief.
The filing says Google chose to take content without licenses and that the company “chose to steal a massive body of content from Plaintiffs and the Class to train its AI model,” alleging infringement “at every stage” of model development. The publishers claim Google downloaded works from pirate repositories, then copied them into memory, converted them to readable formats, and included them in training sets for successive models.
The complaint singles out Google’s C4 training dataset and alleges it contains works scraped from Z‑Library and at least 28 other sites the U.S. government has linked to piracy. The filing states that “The copyright symbol (©) appears more than 200 million times in the C4 dataset.” It also notes copies came from domains now displaying federal seizure notices and from subscription libraries such as Scribd.
The publishers ask the court for statutory damages, injunctions to stop further use, an order to destroy unauthorized copies, and disclosure of which books trained Gemini. They also cite a response from dataset provider Common Crawl that allegedly said, “You shouldn’t have put your content on the internet if you didn’t want it to be on the internet.”
The motion seeks to join an existing copyright action originally filed by authors in 2023; that consolidated case is available through the public docket. The filing follows a series of 2023 lawsuits over AI training data, where judges granted mixed rulings on fair use while criticizing long‑term retention of pirated works.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
Previous Articles:
- Radix launches public Hyperscale test targeting 500k TPS now
- Vitalik: Web3 ‘decentralized renaissance’ finally arrives…
- 95% Win-Rate Bitcoin Whale Opens $96.0M 3x Long, Accumulates
- KBC to Offer Bitcoin and Ether Trading to Belgian Retail Feb
- UAT-8837: China-linked uses Sitecore zero-day to target CNI.
