Two big rulings: Courts are starting to expose AI piracy of copyrighted material

Image of a figure halting the theft of text from a book

Two recent decisions in AI copyright infringement lawsuits trended in favor of copyright holders and against big AI corporations. Image by Moshe Harosh from Pixabay

Feb. 12, 2025 — Two recent federal court rulings have dealt a blow to the “fair use” argument major AI corporations have used to shield themselves from copyright infringement claims.

The rulings, while not final, are among the first significant decisions to come out of the dozens of copyright infringement lawsuits filed against AI developers in the past two years. Those lawsuits rest on the claim that major AI developers have illegally used copyrighted material as training data to manufacture their AI models, resulting in billions of dollars in benefit for the AI developers and zero dollars for the copyright holders.

On Tuesday, a US District Court judge in Delaware issued a partial summary judgment in favor of publisher Thomson Reuters in its copyright infringement lawsuit against the AI company Ross Intelligence. “None of Ross’s possible defenses hold water,” wrote the judge. “I reject them all.”

That ruling followed the legal setback handed to Meta last month, which was ordered to file full versions of in-house documents related to the development of its Llama AI model. The federal judge in that case called Meta’s reason for redacting the documents “preposterous,” and that Meta wanted to hide the documents only to “avoid negative publicity.”

Those documents turned out to be damaging—and possibly in legal terms. The documents contained exchanges between Meta developers that seemed to reveal an awareness of the unethical and potentially illegal nature of using pirated material to train an AI model.

thomson reuters v. ross AI

In its lawsuit Thomson Reuters, publisher of the leading legal research platform Westlaw, claimed that the Ross Intelligence AI development team illegally used Westlaw content to train its Ross AI model. Ross Intelligence offers AI-based legal research that directly competes with Westlaw. (The Free Law Project’s Court Listener platform maintains a full repository of documents in this case.)

US District Court of Delaware judge Stephanos Bibas wrote: “None of Ross’s possible defenses holds water” against accusations of copyright infringement.

This case holds a number of potential lessons for other AI copyright cases. The first lesson is that the issues at play are complex and take time. Judge Bibas initially denied Thomson Reuters’ motions for summary judgment back in 2023. In 2024 the judge, in his words, “studied the case materials more closely” and realized his prior ruling “had not gone far enough.”

Lesson two: Details specific to each copyright case really matter.

Case facts: Ross Intelligence initially asked Thomson Reuters to license Westlaw material to train its AI model. Thomson declined, knowing that Ross would likely use its own material to compete against Westlaw. Ross then contracted with a third party, LegalEase, to parcel out Westlaw material in slightly different form known as Bulk Memo questions. Those Bulk Memo questions were then used to train the Ross AI model. On its face, this could be construed as a kind of data money laundering.

“The dispute boils down to whether the LegalEase Bulk Memo questions copied Thomson Reuters’s headnotes or were instead taken from uncopyrightable judicial opinions.”

Looking painstakingly note by note, the judge found that of 2,830 legal headnotes, “Ross infringed 2,243 headnotes” belonging to Westlaw. He added that “actual copying is so obvious that no reasonable jury could find otherwise.”

Kadrey v. meta

Kadrey v. Meta began as a copyright infringement lawsuit filed against Meta in June 2023 by the novelist Richard Kadrey, joined by Sarah Silverman and others. Over time the case has merged with other copyright infringement claims to become one of two big-tent lawsuits in the book subspecies of AI-copyright cases.

Kadrey is being heard by in Northern California by U.S. District Court Judge Vince Chhabria. As part of the discovery process, Meta has been ordered to produce internal communications regarding the training data for its AI models. Meta’s lawyers asked to have this material sealed from public view.

Judget Chhabria, in a Jan. 8, 2025 ruling, denied that request with vigor:

“Meta’s request is preposterous. With one possible exception, there is not a single thing in those briefs that should be sealed… It is clear that Meta’s sealing request is not designed to protect against the disclosure of sensitive business information that competitors could use to their advantage. Rather, it is designed to avoid negative publicity. This is reflected in a statement by a Meta employee from one of the documents Meta seeks to seal: ‘If there is media coverage suggesting we have used a dataset we know to be pirated, such as LibGen, this may undermine our negotiating position with regulators on these issues.’”

A selection of those communications, filed as a case exhibit on Feb. 5, 2025, does seem to undermine Meta’s position. A sampling:

  • A senior research manager working on Meta’s Llama AI model: “I don’t think we should use pirated material. I really need to draw a line there.” (Oct. 19, 2022)

  • Two Meta employees exchange views: One says that “using pirated material should be beyond our ethical threshold.” Another responds, “You think it’s problematic to use even for this phase?” Yes, says the interlocutor. “SciHub, ResearchGate, LibGen are basically like PirateBay or something like that, they are distributing content that is protected by copyright and they’re infringing it.” (Oct 19, 2022)

  • Another document appears to be notes from a Jan. 2023 meeting attended by Meta CEO Mark Zuckerberg. A large section is titled “Legal Escalations,” and the notes state that [Zuckerberg] wants to move this stuff forward,” and “we need to find a way to unblock all this.” (Jan. 17, 2023)

  • In an internal message, a Meta engineer expresses concern about using Meta IP addresses “to load through torrents pirate content” and “torrenting from a corporate laptop doesn’t feel right.” (April 21, 2023)

It’s worth noting that the damning messages sampled above aren’t from the redacted material Meta is trying to hide. This is the stuff they’re willing to reveal. Which begs the question of what’s contained in the material still to come.

Ongoing court cases

These cases are just two among dozens of lawsuits moving through federal courts in California, New York, and other jurisdictions. TCAI has a curated guide to the most important cases.

We will continue to cover the progress of these cases in the coming months.

Select image to access TCAI’s curated guide to AI lawsuits.

Previous
Previous

The complete guide to TRAIGA, the Texas Responsible AI Governance Act

Next
Next

Legislative update: 11 TCAI-backed bills moving in 6 states right now