
Google started rolling out the June spam update, the second of the year. It enforces documented spam policies, and one of those policies now covers more ground than it once did.
Google’s spam rules treat attempts to “manipulate generative AI responses” in Search as a violation, and that’s one of the policies the update is enforcing.
A Cornell Tech preprint picked up by 404 Media gets at why the policy is harder to enforce than its wording implies. The community pages that AI research agents lean on can also carry third-party comments, and a comment can plant a recommendation that the author never wrote.
What Google labels spam, therefore, travels through the very retrieval that these agents rely on. And research finds that the obvious defenses all come with drawbacks.
For anyone trying to push a brand into AI-generated answers, know that the line between optimization and spam is getting redrawn.
The Stakes
SE Ranking’s tracking of AI Mode found Google increasingly pointing to its own properties, with self-citations up to roughly a fifth of AI Mode citations in its latest report.
With more citations pointing to Google and fewer to external websites, the pull to manufacture one rises accordingly.
A gray market has already begun to form, and the Cornell authors point out that marketers are busy testing ways to nudge AI-generated answers.
Businesses, meanwhile, don’t have the data they need to see what’s happening. As our earlier coverage of agentic search laid out, no dashboard tells a site whether it landed in an AI answer, got cited in a generated report, or was passed over.
The result is a violation Google can name but the site involved often can’t see.
What The Research Found
The paper, titled “Deep-Research Agents Can Be Poisoned via User-Generated Content,” which hasn’t been peer-reviewed, probes a weak spot in how AI research tools collect their sources. These tools answer a question by firing off a batch of related sub-queries, grabbing the pages that keep coming up across them, and assembling a report with citations.
Analysis revealed the same community pages surfacing repeatedly in those sub-queries. Inside a single topic cluster, one user-generated page turned up in as many as 48% of queries, and user-generated platforms made up 17% to 23% of every URL retrieved. Alter one of those recurring pages, and the change can ripple into the reports for a whole topic.
The authors found that roughly 13 words of planted text on a recurring page were enough to insert an attacker’s chosen entity into the finished report in 38% to 51% of sessions that retrieved the page.
Scatter the same text across a handful of pages, and the figure climbed to 42% to 62%. Even buried inside a full page, where it made up under 4% of what the agent read, the planted text still surfaced in 30% to 53% of sessions.
Three open-source research agents took the tests, STORM, Co-STORM, and OmniThink, all run in a simulation so that nothing on the live web was touched.
Where Enforcement Is Hard
Google can label AI-answer manipulation as spam and act on what it catches. Catching it is the hard part. The planted text reads like real advice, and it sits on the same pages the tools were always going to read, so telling it apart from a normal post is the main problem.
The research team looked for a defense against planted text but didn’t find one. They tried cutting user-generated sources out, screening them with a language model before use, and combing the finished report for claims that didn’t hold up.
None of the three stopped the attack without making the results worse for the user. Drop the user-generated sources, and you lose the community detail that makes AI search tools worth using.
The tools most people use sit outside that test. ChatGPT Deep Research and Gemini Deep Research run retrieval the researchers couldn’t poison without crossing an ethical line, so they only measured citation habits. Gemini leaned on user-generated content 12.1% of the time, which the authors call a hint of exposure, not a tested result. OpenAI’s tool reached for it far less.
Why This Matters For Search Professionals
The moves that can help lift a brand into AI answers are similar to the manipulation tactics Google calls “spam,” such as planting mentions across the sites these tools read. We don’t know where Google’s line falls between earning a mention and engineering one.
For ecommerce and local brands, the danger comes from the other direction.
The test cases were the ordinary things people ask, such as which service to call, which product to buy, and where to eat. A rival or a scammer can slip an unfamiliar name into those answers, right next to the legitimate options, and the brand being edged out would never know it.
For news publishers and bigger brands, the worry is trust in the answer their name lands in. A citation from an AI tool is seen as a win, but a citation only reflects what the tool pulled, not whether that page was right, and the answer can be steered by content the brand never wrote.
There’s no tidy fix to all this. AI visibility has become a surface you actively monitor, not just a channel you passively optimize for.
Looking Ahead
The authors called user-generated manipulation an open problem that no single platform can fix on its own. Reddit has flagged its long-running fight against coordinated manipulation, and Google has bolted context labels onto some Reddit-sourced material in AI Overviews. Neither one touches the retrieval concentration the paper points to.
Google hasn’t indicated how it intends to enforce generative-AI manipulation, whether through a dedicated update or through its SpamBrain system and manual reviews it relies on for most violations.
For now, the policy calls the behavior out of bounds, and vetting AI responses still rests with whoever is reading them.
More Resources:
Featured Image: Cheer-J-ane/Shutterstock
https://www.searchenginejournal.com/googles-spam-update-now-reaches-ai-answers-enforcement-is-hard/580535/


