As artificial intelligence programs have become ubiquitous over the past year, so have lawsuits from authors and other creative professionals who argue that their work has been essential to that ubiquity—the “large language models” (or LLMs) that power text-generating AI tools are trained on content that has been scraped from the web, without its authors’ consent—and that they deserve to be paid for it. Last week, my colleague Yona Roberts Golding wrote about how media outlets, specifically, are weighing legal action against companies that offer AI products, including OpenAI, Meta, and Google. They may have a case: a 2021 analysis of a dataset used by many AI programs showed that half of its top ten sources were news outlets. As Roberts Golding noted, Karla Ortiz, a conceptual artist and one of the plaintiffs in a lawsuit against three AI services, recently told a roundtable hosted by the Federal Trade Commission that the creative economy only works “when the basic tenets of consent, credit, compensation, and transparency are followed.”
As Roberts Golding pointed out, however, AI companies maintain that their datasets are protected by the “fair use” doctrine in copyright law, which allows for copyrighted work to be repurposed under certain limited conditions. Matthew Butterick, Ortiz’s lawyer, told Roberts Golding that he is not convinced by this argument; LLMs are “being held out commercially as replacing authors,” he said, noting that AI-generated books have already been sold on Amazon, under real or fake names. Most copyright experts would probably agree that duplicating a book word for word isn’t fair use. But some observers believe that the scraping of books and other content to train LLMs likely is protected by the fair use exception—or, at least, that it should be. In any case, new debates around news content, copyright, and AI are building on similar debates around other types of creative content—debates that have been live throughout AI’s recent period of rapid development, and that build on much older legal concepts and arguments.
Continue reading “If an AI program scans a book, is that copyright infringement or fair use?”