
Back in October, I wrote about artificial intelligence, and specifically about one of the crucial questions experts still can’t seem to agree on, which is whether it is going to destroy us or not. In that piece, I also mentioned the debate over whether the indexing or “ingesting” that AI large-language models do is — or at least should be — covered by the fair-use exception in copyright law. I didn’t spend a lot of time on it because it wasn’t directly relevant to the danger issue, but I wanted to expand on some of the points I made then, and also in a Columbia Journalism Review piece that I wrote last year. I am not a cheerleader for giant technology companies by any means, but I think there is an important principle at stake. And at the heart of it are some key questions: What (or who) is copyright law for? What was it originally designed to do? And does AI scraping or indexing of copyrighted content fit into that, and if so, how?
The case against AI indexing of content is relatively straightforward: by hoovering up content online and then using it to create a massive database for training large-language models, AI engines copy that content without asking and without paying for it (unless the publisher or owner has signed a deal with the AI company, as some news outlets have). This pretty clearly qualifies as de facto copyright infringement, as the Authors Guild and the New York Times and a number of others have argued and continue to argue. In a similar way, one could imagine that if a company were to copy millions of books and use them to create a massive index of content, that would pretty clearly qualify as infringement as well — copying without permission or payment.
The major difference between these two cases is that the second hypothetical one actually happened, when Google scanned millions of books as part of its Google Books project between 2002 and 2005, and created an index that allowed users to search for content from those books. After years of back-and-forth negotiations over payment for the infringement, this led to a lawsuit in which the Authors Guild and others argued that Google was guilty of copyright infringement on a massive scale. In the early days of that case, Judge Denny Chin of the Southern District of New York seemed to agree, but then at some point he changed his mind, and ruled that Google’s book-scanning activity was covered by the fair-use exception under US copyright law.
Continue reading “Why AI content scraping should qualify as fair use”