Get up to speed on AI and all things tech at NexTech, November 14-15 in NYC (+ virtual). Hear from leaders at Roc Nation, Legitimate, Tracer and more. Save 20% now.
Publishers can now opt out of having their data used to train Google’s AI models such as Bard.
Top line
Google’s new tool, Google-Extended, allows websites to be indexed by its search engine through crawlers like Googlebot. It also offers publishers the option to opt out of their data being scraped for the training of AI models.
This new tool, according to Google, allows publishers to manage whether their sites help improve Bard and Vertex AI generative APIs.
However, Alex Berger, senior product marketing director at ad platform Adform, found this move by Google “shady.”
“Saying, ‘Hey we’re going to plagiarize your content, train our database on it—which we’re in turn commercializing—with zero opt-in and consent from your end, but you can opt out if you want to in some obtuse way’ seems like utter nonsense to me,” Berger wrote in a LinkedIn post.
“And then their nuke is that based on their monopoly positioning, it’s basically death if they remove you or block you. They tease now that you aren’t penalized. But I’d bet money within 24 months there’s a ranking dimension that radically penalizes anyone who doesn’t opt in.”
In July, Gizmodo spotted that Google revised its privacy policy. Its range of AI services, including Bard and Cloud AI, would be trained using public data that the company has scraped across the internet.
Adweek has reached out to Google for comment.
Between the lines
Google-Extended can be accessed via the robots.txt file, which serves as a text-based directive informing web crawlers about site access permissions.
“As AI applications expand, web publishers will face the increasing complexity of managing different uses at scale,” the tech giant said. “That’s why we’re committed to engaging with the web and AI communities to explore additional machine-readable approaches to choose and control for web publishers.”
However, publishers have found themselves in a dilemma over Google’s creepy crawlers. Blocking these crawlers could lead to losing out on appearing in search results, which are a prominent way of growing organic traffic and generating revenue.
This has led some publishers like The New York Times to take a legal route by updating its Terms of Service to forbid the scraping of its content to train a machine learning or AI system.