Technology

Times seeks to expand Microsoft claim over OpenAI supercomputer

The New York Times says Microsoft built custom computing infrastructure to help OpenAI train AI systems on its copyrighted journalism.

James Whitfield

By James Whitfield · Staff Writer

3 min read

Times seeks to expand Microsoft claim over OpenAI supercomputer
Photo: Ars Technica

The New York Times asked a court Thursday to let it revise its copyright lawsuit against OpenAI and Microsoft, adding sharper allegations about Microsoft’s role in OpenAI’s AI training. In a heavily redacted filing, the newspaper said Microsoft built a custom supercomputer that helped OpenAI use Times journalism without permission.

The filing seeks to adjust the Times’ contributory infringement claim after a recent Supreme Court ruling in a dispute involving Cox Communications and Sony changed the standard for such claims. Under that precedent, plaintiffs must show a defendant intentionally encouraged unlawful conduct, according to the Times’ motion.

Graham James, a New York Times spokesperson, said the company asked to amend its complaint to reflect “new law and new evidence uncovered during discovery.” He said the core allegation remains that Microsoft and OpenAI used millions of Times works to build competing products and profit from them.

Microsoft rejected the move. A company spokesperson said the revised complaint was “a last-ditch effort” to preserve a claim after unfavorable recent precedent.

Supercomputer becomes a focus

The New York Times first sued OpenAI in 2023, making it the first major publisher to bring a copyright case against the company. The newspaper alleged that ChatGPT was trained on its articles, reproduced portions of them, harmed the market for Times subscriptions and created reputational damage by falsely attributing material to Times reporting.

The proposed amended complaint gives more attention to Microsoft’s infrastructure. The Times says Microsoft did more than offer ordinary cloud services and instead designed a powerful system for OpenAI to train large language models on copyrighted material pulled from the internet.

According to the Times’ filing, the system was built to process large amounts of web content and gave extra weight to Times works because Microsoft and OpenAI wanted high-quality journalism in training data. The newspaper alleges Microsoft helped choose the works that were copied and supplied the technical means to use them without authorization.

The Times also claims Microsoft has benefited financially from AI models trained on its material. In the filing, the newspaper alleges that Microsoft’s use of large language models across its products helped increase its market capitalization by $1 trillion in the past year.

The Times said allowing the amendment would not delay the case because it is not seeking additional discovery tied to the revised claims. It also agreed to drop two claims: contributory copyright infringement and trademark dilution against all defendants.

Fair use fight continues

OpenAI and Microsoft have argued that training AI systems on copyrighted material qualifies as fair use. OpenAI spokesperson Drew Pusateri said the company’s models are trained on publicly available data and are “grounded in fair use.”

The Times says the tools compete with its journalism because they can produce close copies of Times articles. Its complaint cites user sessions in which ChatGPT allegedly generated substantial passages from articles, including cases where users said they were trying to get around paywalls.

The newspaper also points to alleged hallucinations by Microsoft and OpenAI systems. The complaint says Bing Chat and ChatGPT have attributed fabricated material to the Times, including fake quotes and a nonexistent article linking non-Hodgkin’s lymphoma to orange juice consumption.

OpenAI has argued, according to the Times, that ChatGPT is not a substitute for a Times subscription because the systems transform the material for another use. The newspaper is seeking permanent injunctive relief to stop alleged future infringement, along with damages tied to what it says are profits from copyrighted works the defendants do not own.

This story draws on original reporting from Ars Technica.