New York Times: Don’t use our content to train AI systems

Although Google wants all online content available for AI training, the New York Times clearly wants to opt out.

The Times has changed its terms of service, aiming to prevent AI companies from using the media organization’s content to train their systems.

Why we care. Many large language models are trained using website content (see: Search the 15.7 million websites in Google’s C4 dataset). While Google is exploring alternatives or supplemental ways of controlling crawling and indexing beyond robots.txt, many brands (e.g., Reddit) are making it clear right now they don’t want their content used to improve the products and increase the profits for Google, Microsoft and OpenAI – at least not without compensation. You may want to consider adding some similar AI-related messaging to your website’s terms page.

What has changed. The New York Times updated its terms of service page Aug. 3. It includes AI-specific additions that apply to its content (which it defines as “including, but not limited to text, photographs, images, illustrations, designs, audio clips, video clips, ‘look and feel,’ metadata, data, or compilations”).

In the “Prohibited use of the services” section:

(3) use the Content for the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system.

Will AI companies compensate publishers? OpenAI and the Associated Press signed a deal last month. OpenAI licensed the AP’s news article archive dating back to 1985 for training.

Google and the New York Times Co. already have a lucrative “commercial agreement” in place, but that deal is about working together on “tools for content distribution and subscriptions.”

Microsoft is also promising publishers some sort of revenue sharing. However, most of the benefits will apparently go to members of its Start program.

Add Search Engine Land to your Google News feed.

Related stories

New on Search Engine Land

@media screen and (min-width: 800px) { #div-gpt-ad-3191538-7 { display: flex !important; justify-content: center !important; align-items: center !important; min-width:770px; min-height:260px; } } @media screen and (min-width: 1279px) { #div-gpt-ad-3191538-7 { display: flex !important; justify-content: center !important; align-items: center !important; min-width:800px!important; min-height:440px!important; } }

About the author

Danny Goodwin has been Managing Editor of Search Engine Land & Search Marketing Expo – SMX since 2022. He joined Search Engine Land in 2022 as Senior Editor. In addition to reporting on the latest search marketing news, he manages Search Engine Land’s SME (Subject Matter Expert) program. He also helps program U.S. SMX events. Goodwin has been editing and writing about the latest developments and trends in search and digital marketing since 2007. He previously was Executive Editor of Search Engine Journal (from 2017 to 2022), managing editor of Momentology (from 2014-2016) and editor of Search Engine Watch (from 2007 to 2014). He has spoken at many major search conferences and virtual events, and has been sourced for his expertise by a wide range of publications and podcasts.

https://searchengineland.com/new-york-times-content-train-ai-systems-430556