Image by Nokia621, from Wiki Commons

Meta Emails Reveal Torrenting Of Pirated Books For AI Training

Reading time: 3 min

Last Updated: Feb 10, 2025

Written by Kiara Fabbri Multimedia Journalist
Fact-Checked by Justyn Newman Lead Cybersecurity Editor

Newly unsealed emails have surfaced as what book authors are calling the “most damning evidence” against Meta in an ongoing copyright lawsuit, as first reported by Ars Technica.

In a Rush? Here are the Quick Facts!

Meta torrented 81.7 terabytes of pirated books from shadow libraries like LibGen and Z-Library.
Internal emails show Meta employees raised legal concerns about torrenting and seeding copyrighted material.
Meta allegedly concealed torrenting by avoiding Facebook servers and minimizing seeding activity.

Ars Technica reports that the authors allege that Meta illegally trained its AI models on pirated books, and the emails reveal internal concerns about the legality of torrenting and seeding copyrighted material.

Last month, Meta admitted to torrenting a controversial dataset known as LibGen, which contains tens of millions of pirated books.

However, details remained unclear until the unredacted emails were made public.

According to the authors’ court filing, Meta torrented “at least 81.7 terabytes of data across multiple shadow libraries through the site Anna’s Archive, including at least 35.7 terabytes of data from Z-Library and LibGen.” Additionally, “Meta also previously torrented 80.6 terabytes of data from LibGen.”

“The magnitude of Meta’s unlawful torrenting scheme is astonishing,” the authors’ filing stated, noting that even “vastly smaller acts of data piracy—just .008 percent of the amount of copyrighted works Meta pirated—have resulted in Judges referring the conduct to the US Attorneys’ office for criminal investigation.”

Ars Technica notes that the emails also reveal internal unease among Meta employees. In April 2023, research engineer Nikolay Bashlykov wrote, “Torrenting from a corporate laptop doesn’t feel right,” adding a smiley emoji.

He expressed concern about using Meta IP addresses “to load through torrents pirate content.” By September 2023, Bashlykov had dropped the humor, consulting Meta’s legal team and warning that “using torrents would entail ‘seeding’ the files—i.e., sharing the content outside, this could be legally not OK.”

Despite these warnings, authors allege that Meta continued torrenting and seeding pirated content, even attempting to conceal its activities.

Ars Technica reports that internal messages show that Meta avoided using Facebook servers to download the dataset to “avoid” the “risk” of anyone “tracing back the seeder/downloader,” as described by researcher Frank Zhang.

Michael Clark, a Meta executive, also admitted in a deposition that settings were modified “so that the smallest amount of seeding possible could occur.”

The authors now argue that Meta staff involved in the torrenting decision must be deposed again, as the new evidence allegedly “contradicts prior deposition testimony.”

For instance, while CEO Mark Zuckerberg claimed no involvement in using LibGen for AI training, unredacted messages suggest the “decision to use LibGen occurred” after “a prior escalation to MZ.”

Ars Technica reports that Meta has maintained that its AI training on LibGen constitutes “fair use” and denied any unlawful distribution of the authors’ works. However, the torrenting revelations have complicated its defense, allowing authors to expand their claims of direct copyright infringement.

As the case proceeds, Meta faces mounting scrutiny over its handling of copyrighted material, with the authors determined to hold the tech giant accountable for what they describe as a “massive unlawful torrenting scheme.”

Meta Emails Reveal Torrenting Of Pirated Books For AI Training

We're thrilled you enjoyed our work!

Leave a Comment