
If you’ve been following court cases that relate to artificial intelligence, you probably saw some headlines about a recent federal court decision in the US District Court for Delaware, in a copyright-infringement case filed by Thomson Reuters against a company called Ross Intelligence. I saw headlines that claimed this was a “landmark ruling” and a “major win” for content creators, that the AI industry was on the ropes, that this decision marked a turning point in the debate over copyright and artificial intelligence, that fair use is over as a defense for AI training, etc. etc. Is this accurate? Not really. It is definitely true that the court’s ruling is the first significant federal decision related to AI and copyright. But there are a number of reasons why this case doesn’t have as much impact on AI and copyright as the headlines might lead you to believe.
Before I continue, I should note my bias on the question of AI and copyright: Faithful readers of Torment Nexus will recall that in a previous post, I discussed the issue of whether the indexing of content by AI engines should be considered fair use. As I tried to argue in that post, it’s my view that it should. Do LLMs scrape and ingest copyrighted content in large quantities, in most cases without permission? Yes. Do they use this content to generate responses to questions or prompts that relate to the topics discussed in the original versions of that content? Yes. Nevertheless, I believe — as a number of copyright and intellectual property experts do — that this activity should fall under the fair-use exception in US copyright law, for a number of reasons outlined in that post. I’ll get to some of that later, I just wanted to get my bias up front before I continue.
First, some of the facts related to this particular case: Thomson Reuters, which operates the Reuters news-wire service, also owns a number of professional databases that make up the majority of its business. One of those is called Westlaw, and it’s fascinating to me that for years the company had what amounted to a monopoly on the method of citing legal cases in US courts. It seems bizarre now, but Westlaw owned a copyright that covered the system of page-numbering used in US courts, so you literally couldn’t even refer to a previous case for precedent without infringing on Westlaw’s copyright, and the company spent years suing everyone who tried to use it without paying for it — like Lexis-Nexis, a competing legal database. That monopoly over page numbering was mostly dismantled in the late 1990s, but Westlaw still has a copyright on the way cases are summarized, something known as “headnotes.”
Note: This is a version of my Torment Nexus newsletter, which I send out via Ghost, the open-source publishing platform. You can see other issues and sign up here.
Note: In case you are a first-time reader, or you forgot that you signed up for this newsletter, this is The Torment Nexus. You can find out more about me and this newsletter in this post. This newsletter survives solely on your contributions, so please sign up for a paying subscription or visit my Patreon, which you can find here. I also publish a daily email newsletter of odd or interesting links called When The Going Gets Weird, which is here.
Fair use is not binary

What triggered this case is that Ross Intelligence (which shut down not long after the case was filed) tried to create a “natural language search engine” using machine learning, which would allow users to enter questions and have the AI software produce quotations and citations from relevant judicial opinions. In order to train the AI, Ross needed access to a database of legal cases, but Westlaw refused to offer it a license. So Ross used a third-party legal-research company, LegalEase Solutions, which did have a Westlaw license, and used the headnotes to compile a database of cases (Ross claims that it was unaware of this). The result was a database of about 25,000 sets of questions and answers, which Ross then used as training data. Thomson Reuters filed suit in 2020, claiming direct, contributory, and vicarious copyright infringement.
In the decision released last week, Justice Bibas ruled that Ross did infringe on the copyright held by Thomson Reuters, and that the fair-use provision in copyright law did not cover Ross’s software. As some of you may know, and as I described in my previous Torment Nexus piece, fair use is not a black-and-white binary choice, but a test that evaluates four factors: 1) the purpose and character of the infringing use; 2) the nature of the copyrighted work; the amount of the original that is used, and 4) the effect on the value of or market for the original work. As JDSupra summarizes it, the judge ruled that even though the headnotes didn’t appear in the final product, Ross had used them to produce a competing commercial product that would negatively affect Thomson Reuters’ business.
One reason why this isn’t quite the slam-dunk for copyright in the war against AI indexing is that Bibas specifically noted that his ruling didn’t apply to “generative AI,” which ingests a bunch of content but then uses algorithms and other magic to come up with (arguably) new ways of expressing concepts. Ross Intelligence didn’t do this — it simply created a database from which it could extract existing legal opinions, which were similar enough to the headnotes to infringe copyright. “Because the AI landscape is changing rapidly, I note for readers that only non-generative AI is before me today,” Bibas wrote. In other words, extrapolating from his decision to make it apply to generative AI of the type that OpenAI and Perplexity and Google’s Gemini do would be a stretch. That’s not to say lawyers won’t try to do this anyway, but it’s not a slam-dunk.
As I tried to argue in my earlier piece, the claims of infringement against OpenAI and other LLMS would make sense if they reproduced exact copies of existing works, but they don’t do that, at least not for written works (you could make the argument that some image-generation engines do, but that’s a subject for a future column). The New York Times lawsuit gave examples of when it said ChatGPT quoted verbatim from existing stories, but in order to produce these it had to essentially hack the system — in ways that are no longer possible, according to OpenAI — and it took a lot of time and effort. It’s true that the Bibas decision means AI indexing is not per se fair use, but then it was never considered to be that by anyone who understands the issue, because virtually nothing is per se fair use. It’s always a question of balancing the four factors.
What makes something original?

In terms of whether the Thomson Reuters case has wide applicability to cases against OpenAI and other LLMs, it’s important to note one other aspect of the Bibas ruling. Before the fair-use test can be applied, a court has to decide whether copyright even applies to the content in question. The law only protects what it calls “original works of authorship,” so one of the key questions is whether the thing that has been infringed was “original” or not (facts cannot be copyrighted). The concept of originality implies that something is novel or creative — a new way of expressing something. If you think that the definition of “creative” is a little murky, you would be 100 percent correct.
Also, pieces of content that are too short can’t be copyrighted. What does “too short” mean? Great question. There is no legal definition. So is a tweet too short to be copyrighted? No one is quite sure, because it has never been challenged in court (also, I would argue many tweets — including some of mine — aren’t even remotely creative or original, but that’s also open to debate). All of this is important because the headnotes that the Thomson Reuters case refers to are typically only a few sentences long, and are in most cases a factual statement about the details of a decision, like you might see at the top of a news article.
Part of what makes this case complicated is that Justice Bibas seems to have changed his mind about whether Westlaw’s headnotes could be copyrighted or not — in fact, in the latest decision he made a 180-degree turn from his previous ruling on that point. In his original 2023 decision, where he rejected Thomson Reuters’ request for summary judgement, he suggested that the headnotes didn’t exhibit enough originality to make them copyrightable, since they mostly just summarized existing facts of legal decisions, and that a jury would ultimately have to rule on this question. Here’s how he described it then:
Thomson Reuters’s allegedly original expression in its headnotes still reflects uncopyrightable judicial opinions. So the strength of its copyright depends on how much the headnotes overlap with the opinions. Closely hewing differs from copying: If a headnote merely copies a judicial opinion, it is uncopyrightable. But if it varies more than “trivial[ly],” then Westlaw owns a valid copyright. Though editors may have made creative choices about which points of law to summarize, how to summarize them, and where to attach the headnote, those choices are constrained. In general, the headnotes will flag the most salient points of law, largely track the language of the opinion, and be placed at the beginning of a paragraph. This approach is akin to news reporting, which, though protected, must be carefully separated from the unprotected underlying facts.
So what changed between the original ruling in 2023 and now? In essence, the judge seems to have changed his mind about what the term ‘original’ means in the context of something like writing headnotes. He describes it this way: “I previously thought that originality depended on how much the headnotes overlap with the text of the opinions …. but the originality threshold is extremely low, requiring only some minimal degree of creativity …. some creative spark.” Thomson Reuters’ headnotes, Bibas said, introduced creativityinto the proces simply by distilling, synthesizing, or explaining part of an opinion. He compares this process to “that of a sculptor. A block of raw marble, like a judicial opinion, is not copyrightable. Yet a sculptor creates a sculpture by choosing what to cut away and what to leave in place. That sculpture is copyrightable.”
Sculpting vs. summarizing

Eric Goldman — a professor at the Santa Clara University School of Law and an expert in intellectual property as it applies to the internet — notes that he had a number of issues with the Bibas decision, including the question of whether Thomson Reuters headnotes are copyrightable or not. “The headnotes are often just a sentence or two, which ordinarily is too short to merit copyright protection,” he wrote. “The authors’ creativity here is further constrained by the headnote format, which seeks to be as faithful and accurate to the source material as possible. The degree of permissible variation among independent creators of headnotes would be quite small.” Goldman also said the sculptor analogy was problematic, because sculptors have “a wide range of freedom to express themselves, while summarizers of court opinions do not.”
The Authors Alliance also argues that the judge got this part wrong:
The court claims that the Westlaw headnotes are original both individually and as a compilation, and the Key Number System is original and protected as a compilation. “Original” has a special meaning in US copyright law: It means that a work has a modicum of human creativity that our society would want to protect and encourage. Based on the evidence that survived redaction, it is near impossible to find creativity in any individual headnotes. They consist of verbatim copying of judicial texts, along with some basic paraphrasing of facts.
On top of this, Goldman argued that Bibas erred on the four-factor test, particularly the first factor, the purpose and character of the use. This is where the question of “transformativeness” comes in — a question that came down in Google’s favor in the Google Books decision, despite the fact that it copied millions of books. The judge in that case ruled that the creation of a searchable index was transformative (although there were other factors as well, including the fact that Google did not reproduce entire books). Goldman argues that Bibas ignores cases that applied the transformativeness standard to content indexing. In addition, he writes: “I don’t see how the court can say there was no transformation when Ross’ outputs included none of the allegedly infringing material. It’s impossible to be more transformative than that.”
What Goldman and the Author Alliance’s arguments suggest — along with the fact that Justice Bibas completely reversed his prior ruling on the question of originality — is that another court could find the exact opposite: in other words, they could agree with Bibas’ original ruling, which suggested that the Thomson Reuters headnotes are not creative enough to merit copyright protection, and therefore there is no case of infringement. For these and the other reasons that I explored above — and some others that I don’t have time to go into (Google the “merger doctrine” in copyright law) it’s hard to argue that the Bibas decision is some kind of slam-dunk ruling in favor of content creators and against generative AI. Maybe one of the other 40 or so cases involving AI and copyright could provide that kind of argument, but probably not this one.
Got any thoughts or comments? Feel free to either leave them here, or post them on Substack or on my website, or you can also reach me on Twitter, Threads, BlueSky or Mastodon. And thanks for being a reader.
Likes
Reposts