Dozens of big brands have blocked GPTBot, OpenAI’s new web crawler

25 Agosto 2023 Marketing, Rassegna Stampa, SEO

At least 69 of the 1,000 most popular websites in the world have blocked GPTBot, the new web crawler OpenAI introduced Aug. 7, according to a new analysis.

And the percentage of sites is increasing by about 5% per week, according to AI content and plagiarism service Originality.ai.

Why we care. To block or not to block ChatGPT? That has been the big question for many SEOs. Clearly, several popular websites have already blocked GPTBot, presumably because they don’t want OpenAI scraping their data to help train its models – at least not without compensation. Additionally, ChatGPT does not cite or link to its sources.

By the numbers. The 15 most popular sites blocking ChatGPT, according to the analysis, are:

amazon.com
quora.com
nytimes.com
shutterstock.com
wikihow.com
cnn.com
foursquare.com
healthline.com
scribd.com
businessinsider.com
reuters.com
medicalnewstoday.com
goodhousekeeping.co
amazon.co.uk
tumblr.com

But. Even though many sites are blocking GPTBot, they are not also blocking CCbot, Common Crawl’s web crawler. Part of the training data used by OpenAI, Google and others comes from Common Crawl.

There are a few noteworthy exceptions that block both bots, such as the New York Times, which clearly does not want its content used to train AI systems. Other popular websites blocking both GPTBot and CCbot include shutterstock.com, reuters.com and goodhousekeeping.com.

At least 62 of the top 1,000 websites have blocked CCBot.

Limitations. 241 robots.txt files out of the 1,000 websites were not identified/inspected as part of this analysis. (That’s why I wrote “at least” in the opening sentence.)

Originality.ai’s analysis. Websites That Have Blocked OpenAI’s GPTBot – 1000 Website Study

Dig deeper. Should you block ChatGPT’s web browser plugin from accessing your website?

Add Search Engine Land to your Google News feed.

Related stories

New on Search Engine Land

About the author

Danny Goodwin has been Managing Editor of Search Engine Land & Search Marketing Expo – SMX since 2022. He joined Search Engine Land in 2022 as Senior Editor. In addition to reporting on the latest search marketing news, he manages Search Engine Land’s SME (Subject Matter Expert) program. He also helps program U.S. SMX events. Goodwin has been editing and writing about the latest developments and trends in search and digital marketing since 2007. He previously was Executive Editor of Search Engine Journal (from 2017 to 2022), managing editor of Momentology (from 2014-2016) and editor of Search Engine Watch (from 2007 to 2014). He has spoken at many major search conferences and virtual events, and has been sourced for his expertise by a wide range of publications and podcasts.

https://searchengineland.com/websites-blocking-gptbot-431183

Dozens of big brands have blocked GPTBot, OpenAI’s new web crawler

Evidenziatore

Ricerca avanzata

Evidenziatore

Tag

Ricerca avanzata

Related Post