New York Times Sues Microsoft and OpenAI, Alleging Copyright Infringement

News publisher says AI tools use its content without permission; tech companies have said training AI with web content is ‘fair use’

31.12.2023 13:00

The Wall Street Journal

By Alexandra Bruell, The Wall Street Journal

New York Times Sues Microsoft and OpenAI, Alleging Copyright Infringement

31.12.2023 13:00

The Wall Street Journal

By Alexandra Bruell, The Wall Street Journal

News publisher says AI tools use its content without permission; tech companies have said training AI with web content is ‘fair use’

The New York Times sued Microsoft and OpenAI for alleged copyright infringement, touching off a legal fight over generative-AI technologies with far-reaching implications for the future of the news publishing business.

In a complaint filed Wednesday, the Times said the technology companies exploited its content without permission to create their AI products, including OpenAI’s humanlike chatbot ChatGPT and Microsoft’s Copilot. The tools were trained on millions of pieces of Times content, the suit said, and draw on that material to serve up answers to users’ prompts.

The suit opens a new front in a yearslong battle between tech and media companies over the economics of the internet, pitting one of the news industry’s biggest players against pioneers of a new wave of artificial-intelligence technologies. It comes after months of commercial negotiations between the companies failed to produce a deal, according to the Times.

In its complaint, the Times said it believes it is among the largest sources of proprietary information for OpenAI and Microsoft’s AI products. Their AI tools divert traffic that would otherwise go to the Times’ web properties, depriving the company of advertising, licensing and subscription revenue, the suit said.

The Times is seeking damages, in addition to asking the court to stop the tech companies from using its content and to destroy data sets that include the Times’ work.

“Times journalism is the work of thousands of journalists, whose employment costs hundreds of millions of dollars per year,” the Times said in its complaint. “Defendants have effectively avoided spending the billions of dollars that The Times invested in creating that work by taking it without permission or compensation.”

The Times has asked for a jury trial in the suit, which was filed in U.S. federal court in the Southern District of New York.

Representatives for OpenAI and Microsoft couldn’t immediately be reached for comment.

Tech companies building generative-AI tools have generally argued that content available on the open internet can be used to train their technologies under a legal provision called “fair use,” which allows for copyright material to be used without permission in certain circumstances.

In its suit, the Times said the fair use argument shouldn’t apply, because the AI tools can serve up, almost verbatim, large chunks of text from Times news articles.

The legal landscape surrounding generative-AI is unsettled, with the technology still in its early days. There are other lawsuits that could test the rights of AI companies to “scrape” content from the web to train AI tools, including one by several prominent book authors against OpenAI. In February, Getty Images sued the AI art company Stability AI in Delaware, alleging that it had infringed on Getty’s copyrights. Stability AI at the time said it doesn’t comment on pending litigation.

The U.S. Copyright Office said it launched an initiative to study issues raised by AI, including “the use of copyrighted materials in AI training.” In August, it issued a notice to seek comment on the issue and is assessing whether legislative or regulatory steps are warranted, according to its website.

The Times suit raises the prospect of a fissure in the publishing world—if some major outlets follow the Times in pursuing legal action, while others negotiate for compensation from OpenAI, Microsoft and Google, which is developing its own AI efforts.

Already, a few publishers, including the Associated Press and Axel Springer, the publisher of sites such as Politico and Business Insider, have reached commercial agreements to license their content to OpenAI.

Barry Diller, the chairman of IAC, which owns sites like Better Homes & Gardens, People and Verywell Health, has said he believes publishers’ copyrights are being violated.

Robert Thomson, chief executive of Wall Street Journal parent News Corp, has been vocal about his concerns about AI, including the potential for tools to use publishers’ content without permission.

News Corp has had commercial discussions with AI companies but hasn’t announced any licensing agreements.

Many news media executives look at tech companies with a jaundiced eye after their experiences over the past decade. Google and Facebook helped publishers reach audiences and build up their web traffic, but the tech companies became fearsome competitors for online-ad dollars and had the power to grow or shrink news traffic with algorithmic changes.

Having failed to secure what they saw as their fair share of the explosive internet growth powered by search and social media, publishers don’t want to meet the same fate with AI.

OpenAI started gaining traction last year with a release of ChatGPT that wowed users by generating humanlike written responses to user queries about pretty much anything—from a salsa recipe to a travel itinerary for Greece to information about historical events.

Microsoft entered the picture as a major partner for OpenAI, agreeing to invest $13 billion in the company in exchange for what is essentially a 49% stake in the earnings of its for-profit arm.

The Times said the AI tools OpenAI and Microsoft have created, built in part on its content, have propelled major increases in their valuations. “Using the valuable intellectual property of others in these ways without paying for it has been extremely lucrative for Defendants,” the Times said.

A.G. Sulzberger, the Times’s publisher, has been less outspoken in public than some of his peers about the threats generative-AI platforms pose to the news industry. Now, his company is at the forefront of the legal fight against AI companies.

The Times said it reached out to Microsoft and OpenAI in April to try to reach a commercial deal. “The Times’s goal during these negotiations was to ensure it received fair value for the use of its content, facilitate the continuation of a healthy news ecosystem, and help develop GenAI technology in a responsible way that benefits society and supports a well-informed public,” the company said in its complaint. The Times cited other deals it forged with major tech platforms.

The Times gives priority to digital subscriptions, with a bundle that includes not just news but sports, cooking, games and product recommendations. In the third quarter, the company reported more than nine million digital subscribers.

The Times supplied several examples of output from OpenAI’s ChatGPT that closely resembled passages in Times articles. For example, OpenAI recited large portions of a 2019 report based on an 18-month investigation of predatory lending in New York City’s taxi industry, the complaint said.

“The law does not permit the kind of systematic and competitive infringement that Defendants have committed,” The Times said in the suit.

Write to Alexandra Bruell at alexandra.bruell@wsj.com