NYT to OpenAI: No hacking here, just ChatGPT bypassing paywalls

  News
image_pdfimage_print
NYT to OpenAI: No hacking here, just ChatGPT bypassing paywalls

Late Monday, The New York Times responded to OpenAI’s claims that the newspaper “hacked” ChatGPT to “set up” a lawsuit against the leading AI company.

“OpenAI is wrong,” The Times repeatedly argued in a court filing opposing OpenAI’s motion to dismiss the NYT’s lawsuit accusing OpenAI and Microsoft of copyright infringement. “OpenAI’s attention-grabbing claim that The Times ‘hacked’ its products is as irrelevant as it is false.”

OpenAI had argued that NYT allegedly made “tens of thousands of attempts to generate” supposedly “highly anomalous results” showing that ChatGPT would produce excerpts of NYT articles. The NYT’s allegedly deceptive prompts—such as repeatedly asking ChatGPT, “what’s the next sentence?”—targeted “two uncommon and unintended phenomena” from both its developer tools and ChatGPT: training data regurgitation and model hallucination. OpenAI considers both “a bug” that the company says it intends to fix. OpenAI claimed no ordinary user would use ChatGPT this way.

But while defending tactics used to prompt ChatGPT to spout memorized training data—including more than 100 NYT articles—NYT pointed to ChatGPT users who have frequently used the tool to generate entire articles to bypass paywalls.

According to the filing, NYT today has no idea how many of its articles were used to train GPT-3 and OpenAI’s subsequent AI models, or which specific articles were used, because OpenAI has “not publicly disclosed the makeup of the datasets used to train” its AI models. Rather than setting up a lawsuit, NYT was prompting ChatGPT to discover evidence in attempts to track the full extent of copyright infringement of the tool, NYT argued.

To figure out if ChatGPT was infringing its copyrights on certain articles, NYT “elicited examples of memorization by prompting GPT-4 with the first few words or sentences of Times articles,” the court filing said.

OpenAI had tried to argue that “in the real world, people do not use ChatGPT or any other OpenAI product” to generate precise text from articles behind paywalls. But the “use of ChatGPT to bypass paywalls” is “widely reported,” NYT argued.

“In OpenAI’s telling, The Times engaged in wrongdoing by detecting OpenAI’s theft of The Times’s own copyrighted content,” NYT’s court filing said. “OpenAI’s true grievance is not about how The Times conducted its investigation, but instead what that investigation exposed: that Defendants built their products by copying The Times’s content on an unprecedented scale—a fact that OpenAI does not, and cannot, dispute.”

NYT declined Ars’ request to comment. OpenAI did not immediately respond to Ars’ request to comment.

ChatGPT users bypassing paywalls

According to the NYT’s court filing, ChatGPT outputs initially only infringed copyright by “showing copies and/or derivatives of Times works that were copied to build the model.” But then, in May 2023, a “Browse By Bing” plug-in was introduced to ChatGPT that “enabled ChatGPT to retrieve content beyond what was included in the underlying model’s training dataset,” infringing copyright by “showing synthetic search results that paraphrase Times works retrieved and copied in response to user search queries in real time.”

This feature enabled ChatGPT users to bypass paywalls and access more recent content from outlets like NYT, which caused OpenAI to temporarily disable “Browse By Bing” last July.

“We’ve learned that the browsing beta can occasionally display content in ways we don’t want, e.g. if a user specifically asks for a URL’s full text, it may inadvertently fulfill this request,” OpenAI’s help page said. “We are temporarily disabling Browse while we fix this.”

OpenAI’s decision to disable this feature riled some users who were using ChatGPT to bypass paywalls. In a ChatGPT subreddit, thousands took notice of a post calling attention to the unintended feature, commenting “Wow, so useful!” and joking, “Enjoy it while it lasts.”

On an OpenAI community page, one paid ChatGPT user complained that OpenAI is “working against the paid users of ChatGPT Plus. This time they’re taking away Browsing, because it reads the content of a site that the user asks for? Please, that’s what I pay for Plus for.”

“I know it’s no use complaining, because OpenAI is going to increasingly ‘castrate’ ChatGPT 4,” the ChatGPT user continued, “but there’s my rant.”

NYT argued that public reports of users turning to ChatGPT to bypass paywalls “contradict OpenAI’s contention that its products have not been used to serve up paywall-protected content, underscoring the need for discovery” in the lawsuit, rather than dismissal.

https://arstechnica.com/?p=2009620