The New York Times Updates Terms of Service to Prevent AI Scraping Its Content

  Rassegna Stampa, Social
image_pdfimage_print

While tech companies like OpenAI are reticent to disclose what they train their AI models on, The Washington Post analyzed Google’s C4 data set, a smaller version of the CommonCrawl dataset, to understand what was training the models. It found evidence that content from 15 million websites, including The New York Times, have been used to train LLMs such as Meta’s LLaMAa and Google’s T5—an open-source language model that helps developers build software for translation tasks.

All this has spurred other publishers to reevaluate their terms of services, according to Chris Pedigo, svp for government affairs at trade body Digital Content Next, whose members include The New York Times and The Washington Post.

More licensing deals to come

While it’s unclear how AI companies will respond to these updated terms of services, they have a vested interest in shielding themselves from legal repercussions.

As a result, discussions are underway between AI companies and major publishers to establish licensing agreements, according to Pedigo, such as the deal between OpenAI and The Associated Press.

These deals are primarily set for AI companies to compensate publishers for their content. However, there’s a desire from publishers to go beyond just financial matters.

Ongoing negotiations look at how to cite publishers for their content, including aspects like footnotes. Simultaneously, there is a focus on establishing mechanisms such as guardrails and fact-checking processes within AI companies to prevent the generation of factually inaccurate content by the LLMs.

“Publishers would not want to be associated with that, especially if they’re going to have a licensing deal,” said Pedigo. “Publishers want to make sure that information meets the brand level.”

Enjoying Adweek’s Content? Register for More Access!

https://www.adweek.com/media/the-new-york-times-updates-terms-of-service-to-prevent-ai-scraping-its-content/

Pagine: 1 2