In December, the New York Times fired an early shot in the battle over whether it is legal for artificial intelligence engines such as OpenAI’s ChatGPT to scrape content from the web as fodder for their databases. The Times made clear that it believes the answer is no: the paper sued OpenAI and Microsoft, which has partnered with the company, claiming that their tools used millions of Times articles to train “automated chatbots that now compete with the news outlet as a source of reliable information,” and that, in doing so, they were trying to “free-ride” on the Times‘ investment in journalism. The lawsuit, which I wrote about for CJR back in January, claimed that OpenAI and Microsoft were responsible for “billions of dollars” in damages, and that they should be forced to destroy any data that was based on copyrighted material scraped from the Times.
Last week, eight newspapers owned by Alden Global Capital—the New York Daily News, the Chicago Tribune, the Orlando Sentinel, the South Florida Sun Sentinel, the San Jose Mercury News, the Denver Post, the Orange County Register, and the St. Paul Pioneer Press—filed a similar lawsuit against OpenAI and Microsoft claiming copyright infringement, in the same New York court district where the Times made its complaint. The Alden papers did not follow entirely in the Times’ footsteps: A source told Axios that the Alden papers chose to sue OpenAI without first trying to negotiate a licensing deal with the company, a route that the Times pursued prior to taking legal action. But they did join a growing club: since the Times filed suit against OpenAI, Raw Story, Alternet, and The Intercept have done likewise, citing similar grounds. Those sites are reportedly seeking damages of at least two thousand five hundred dollars per violation.
The Alden complaint accuses OpenAI and Microsoft of using millions of its papers’ articles to train AI products, including ChatGPT and Microsoft’s Copilot, without permission. Much like the Times‘ lawsuit, Alden’s claim doesn’t specify a desired amount of monetary damages, but says that the publishers are entitled to compensation for the illegal use of their content. The Alden suit also echoes the Times’ in claiming that ChatGPT and Copilot have regularly reproduced the entire text of articles from Alden papers in response to users’ prompts—and that, in most cases, those engines did not link back to the original source, depriving the publishers of revenue.
Note: this post was originally published as the daily newsletter for the Columbia Journalism Review, where I am the chief digital writer
The Alden suit ultimately accuses Microsoft and OpenAI of taking “the work product of reporters, journalists, editorial writers, editors and others who contribute to the work of local newspapers—all without any regard for the efforts, much less the legal rights, of those who create and publish the news.” The suit goes on to say that content scraping by AI companies is not just a business problem for the newspaper industry but also a “critical issue for civic life in America,” since local news is the “bedrock of democracy” and its future is at risk, including from OpenAI and Microsoft.
OpenAI did introduce a feature last year that allows publishers to opt out of having their content indexed by the company’s AI engine. And, while the company hasn’t yet filed a response to Alden’s suit, it said in a March response to the Times‘ claim that the company believes its scraping of content to be covered by the fair use exemption in copyright law—a view that some copyright experts share, as I have written before—and also that the Times engaged in subterfuge in order to get ChatGPT to reproduce entire articles verbatim. OpenAI maintained that most people would not use the tool in this way—and that, even if they wanted to, the system is designed to prevent such behavior. OpenAI alleged that the Times paid someone to hack OpenAI’s products, and that it likely took thousands of attempts to get ChatGPT to generate the results included in the suit. (Ian Crosby, the lead counsel for the Times, responded that what OpenAI calls hacking was “simply using OpenAI’s products to look for evidence that they stole and reproduced the Times’s copyrighted works.”)
While the Times and Alden have gone to court, other news organizations fighting the same war have taken a very different path: they have chosen to strike licensing deals with OpenAI and other companies, agreeing to provide them with access to their content for training and other purposes in exchange for cash and other perks. Last July, the Associated Press became one of the first outlets to sign an agreement with OpenAI; in return for giving OpenAI access to its archive of news stories, the AP said, it would gain the ability to “leverage OpenAI’s technology and product expertise.” (The financial terms of the agreement were not disclosed; I wrote about the deal at the time.) Kristin Heitmann, the AP’s chief revenue officer, said that the service was pleased that OpenAI had recognized that “fact-based, nonpartisan news content is essential to this evolving technology,” and also that it respected the value of the AP’s intellectual property.
In December, just before the Times filed its lawsuit, OpenAI announced another licensing deal, this time with Axel Springer, the German media giant that owns Politico and Business Insider as well as major European publications including Bild and Die Welt. In return for the right to train its AI engine on Axel Springer content, OpenAI agreed to give users of ChatGPT summaries of news stories from Axel Springer’s various brands, along with attribution and links to the original source—addressing a particular bone of contention for media companies. OpenAI described the licensing deal as being part of a commitment to help publishers and creators develop new revenue models. A source told the Times that the deal is worth more than ten million dollars per year.
The drip of deals has continued this year. In March, OpenAI announced agreements with Le Monde, the French newspaper company, and Prisa Media, a Spanish publishing conglomerate. A week ago, the Financial Times agreed to allow OpenAI to crawl and index its content to train ChatGPT and other AI products; as with the Axel Springer deal, OpenAI agreed to provide summaries of FT stories to ChatGPT users and to link back to the original source. The latest publisher to sign a deal with OpenAI is Dotdash Meredith, the media company that owns magazines including People, Better Homes & Gardens, and InStyle, as well as websites that deliver news on health, investing, and other subjects. Axios reported on Tuesday that the deal will allow Dotdash Meredith to use OpenAI tools to help improve its own AI-powered ad-targeting engine, D/Cipher. Per Axios, IAC—Dotdash Meredith’s parent company—at one point tried to create a coalition of publishers to bargain for AI licensing deals, only for the effort to collapse “due to conflicting business incentives within the industry.”
Nor is OpenAI the only technology company that is signing AI-related deals. A week ago, The Information reported that Google had struck an agreement with News Corp., the Murdoch-owned publisher of the Wall Street Journal, the New York Post, and a number of British and Australian newspapers. Google reportedly agreed to pay News Corp. between five and six million dollars per year, in return for which the news company would “develop new AI-related content and products.” A News Corp. spokesperson told Reuters that “we absolutely do not have an AI content licensing deal with Google,” but allowed that News Corp. does have “a number of partnerships with Google across our businesses.” (An earlier deal under which Google paid for access to News Corp. content recently expired.)
Whether a newspaper decides to sue or license their content seems to depend in part on their business model. As Axios points out, reaching a deal made sense for the AP because it makes money by licensing content; other publishers, however, are more reliant on revenue generated by search traffic, and as such might be loath to help an AI company produce content that gives readers no reason to click on a link (or no link to click at all). For publishers like the Times, decades worth of archived stories are likely a valuable bargaining chip; some observers suspect that the paper’s lawsuit is primarily intended as a means to a bigger payout than OpenAI was willing to provide when the two were still negotiating. For many other media companies, it remains an open question whether the right strategy is to cut a deal and pocket the cash, or go to court with all guns blazing. But the need to pick a side seems to be growing.