26% of the top 100 websites are now blocking GPTBot

At least 26 of the top 100 most popular websites – and 242 of the top 1,000 –  are now blocking GPTBot,  the web crawler OpenAI introduced Aug. 7, according to an updated analysis.

Websites Blocking Gptbot Sept2023

Why we care. To block or not to block ChatGPT? That has been a big question for many SEOs because ChatGPT does not cite or link to its sources. We have let search engines crawl our content because there is a clear potential benefit – we get traffic through direct links/citations. Clearly, even more of the most popular websites have decided to block GPTBot, presumably because they don’t want OpenAI scraping their data to help train its models – at least not without some form of compensation.

12 popular websites now blocking GPTBot. Among the new additions from the top 100 most popular sites in the past month, the majority of which publish news and information:

  • pinterest.com
  • indeed.com
  • theguardian.com
  • sciencedirect.com
  • usatoday.com
  • stackexchange.com
  • alamy.com
  • webmd.com
  • dictionary.com
  • washingtonpost.com
  • npr.org
  • cbsnews.com

One big reversal. Interestingly, Foursquare, which was blocking GPTBot last month, no longer is. 

What about CCbot? Common Crawl’s web crawler is still blocked less – by just 130 websites. As a reminder, Common Crawl provides part of the training data used by OpenAI, Google and others. 

  • 109 of the top 1,000 websites block both GPTBot and CCbot.

Limitations. 67 robots.txt files out of the 1,000 websites were not identified/inspected as part of this analysis. (That’s why I wrote “at least” in the opening sentence.)

Originality.ai’s updated analysis. Websites That Have Blocked OpenAI’s GPTBot – 1000 Website Study

Dig deeper. Should you block ChatGPT’s web browser plugin from accessing your website?


Related stories

New on Search Engine Land

@media screen and (min-width: 800px) { #div-gpt-ad-3191538-7 { display: flex !important; justify-content: center !important; align-items: center !important; min-width:770px; min-height:260px; } } @media screen and (min-width: 1279px) { #div-gpt-ad-3191538-7 { display: flex !important; justify-content: center !important; align-items: center !important; min-width:800px!important; min-height:440px!important; } }

About the author

Danny Goodwin

Danny Goodwin has been Managing Editor of Search Engine Land & Search Marketing Expo – SMX since 2022. He joined Search Engine Land in 2022 as Senior Editor. In addition to reporting on the latest search marketing news, he manages Search Engine Land’s SME (Subject Matter Expert) program. He also helps program U.S. SMX events. Goodwin has been editing and writing about the latest developments and trends in search and digital marketing since 2007. He previously was Executive Editor of Search Engine Journal (from 2017 to 2022), managing editor of Momentology (from 2014-2016) and editor of Search Engine Watch (from 2007 to 2014). He has spoken at many major search conferences and virtual events, and has been sourced for his expertise by a wide range of publications and podcasts.

https://searchengineland.com/more-popular-websites-blocking-gptbot-432531