Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over

An illustration of a cartoon worker wiping a hard drive. — Benj Edwards / Getty Images

Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT’s dataset and fines up to $150,000 per infringing piece of content.

NPR spoke to two people “with direct knowledge” who confirmed that the Times’ lawyers were mulling whether a lawsuit might be necessary “to protect the intellectual property rights” of the Times’ reporting.

Neither OpenAI nor the Times immediately responded to Ars’ request to comment.

If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become “the most high-profile” legal battle yet over copyright protection since ChatGPT’s explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.

Of course, ChatGPT isn’t the only generative AI tool drawing legal challenges over copyright claims. In April, experts told Ars that image-generator Stable Diffusion could be a “legal earthquake” due to copyright concerns.

But OpenAI seems to be a prime target for early lawsuits, and NPR reported that OpenAI risks a federal judge ordering ChatGPT’s entire data set to be completely rebuilt—if the Times successfully proves the company copied its content illegally and the court restricts OpenAI training models to only include explicitly authorized data. OpenAI could face huge fines for each piece of infringing content, dealing OpenAI a massive financial blow just months after The Washington Post reported that ChatGPT has begun shedding users, “shaking faith in AI revolution.” Beyond that, a legal victory could trigger an avalanche of similar claims from other rights holders.

Unlike authors who appear most concerned about retaining the option to remove their books from OpenAI’s training models, the Times has other concerns about AI tools like ChatGPT. NPR reported that a “top concern” is that ChatGPT could use The Times’ content to become a “competitor” by “creating text that answers questions based on the original reporting and writing of the paper’s staff.”

As of this month, the Times’ TOS prohibits any use of its content for “the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.”

Now it seems clear that this update provides the Times with an extra layer of protection as NPR reports that the media outlet is seemingly reconsidering a licensing deal with OpenAI. That licensing deal would have ensured that OpenAI paid for NYT content used to train its models. According to NPR, meetings between OpenAI and the Times have become “contentious,” making the deal appear increasingly unlikely as the Times seemingly weighs whether any licensing deal would be worth participating in, as the resulting product could become its fiercest competitor.

To defend its AI training models, OpenAI would likely have to claim “fair use” of all the web content the company sucked up to train tools like ChatGPT. In the potential New York Times case, that would mean proving that copying the Times’ content to craft ChatGPT responses would not compete with the Times.

Experts told NPR that would be challenging for OpenAI because unlike Google Books—which won a federal copyright challenge in 2015 because its excerpts of books did not create a “significant market substitute” for the actual books—ChatGPT could actually replace for some web users the Times’ website as a source of its reporting.

The Times’ lawyers appear to think this is a real risk, and NPR reported that, in June, NYT leaders issued a memo to staff that seems like an early warning of that risk. In the memo, the Times’ chief product officer, Alex Hardiman, and deputy managing editor Sam Dolnick said a top “fear” for the company was “protecting our rights” against generative AI tools.

“How do we ensure that companies that use generative AI respect our intellectual property, brands, reader relationships, and investments?” the memo asked, echoing a question being raised in newsrooms that are beginning to weigh the benefits and risks of generative AI.

Last month, the Associated Press became one of the first news organizations to strike a licensing deal with OpenAI, but the terms of the deal were not disclosed. Today, AP reported that it had joined other news organizations in developing standards for the use of AI in newsrooms, acknowledging that many “news organizations are concerned about their material being used by AI companies without permission or payment.”

In April, the News Media Alliance published AI principles, seeking to defend publishers’ intellectual property by insisting that generative AI “developers and deployers must negotiate with publishers for the right to use” publishers’ content for AI training, AI tools surfacing information, and AI tools synthesizing information.

Ars could not immediately reach the News Media Alliance for comment on the potential impact of the NYT case.

https://arstechnica.com/?p=1961547