Tired Of SEO Spam, Software Engineer Creates A New Search Engine via @sejournal, @martinibuster

A software engineer from New York got so fed up with the irrelevant results and SEO spam in search engines that he decided to create a better one. Two months later, he has a demo search engine up and running. Here is how he did it, and four important insights about what he feels are the hurdles to creating a high-quality search engine.

One of the motives for creating a new search engine was the perception that mainstream search engines contained increasing amount of SEO spam. After two months the software engineer wrote about their creation:

“What’s great is the comparable lack of SEO spam.”

Neural Embeddings

The software engineer, Wilson Lin, decided that the best approach would be neural embeddings. He created a small-scale test to validate the approach and noted that the embeddings approach was successful.

Chunking Content

The next phase was how to process the data, like should it be divided into blocks of paragraphs or sentences? He decided that the sentence level was the most granular level that made sense because it enabled identifying the most relevant answer within a sentence while also enabling the creation of larger paragraph-level embedding units for context and semantic coherence.

But he still had problems with identifying context with indirect references that used words like “it” or “the” so he took an additional step in order to be able to better understand context:

“I trained a DistilBERT classifier model that would take a sentence and the preceding sentences, and label which one (if any) it depends upon in order to retain meaning. Therefore, when embedding a statement, I would follow the “chain” backwards to ensure all dependents were also provided in context.

This also had the benefit of labelling sentences that should never be matched, because they were not “leaf” sentences by themselves.”

Identifying The Main Content

A challenge for crawling was developing a way to ignore the non-content parts of a web page in order to index what Google calls the Main Content (MC). What made this challenging was the fact that all websites use different markup to signal the parts of a web page, and although he didn’t mention it, not all websites use semantic HTML, which would make it vastly easier for crawlers to identify where the main content is.

So he basically relied on HTML tags like the paragraph tag <p> to identify which parts of a web page contained the content and which parts did not.

This is the list of HTML tags he relied on to identify the main content:

blockquote – A quotation
dl – A description list (a list of descriptions or definitions)
ol – An ordered list (like a numbered list)
p – Paragraph element
pre – Preformatted text
table – The element for tabular data
ul – An unordered list (like bullet points)

Issues With Crawling

Crawling was another part that came with a multitude of problems to solve. For example, he discovered, to his surprise, that DNS resolution was a fairly frequent point of failure. The type of URL was another issue, where he had to block any URL from crawling that was not using the HTTPS protocol.

These were some of the challenges:

“They must have https: protocol, not ftp:, data:, javascript:, etc.

They must have a valid eTLD and hostname, and can’t have ports, usernames, or passwords.

Canonicalization is done to deduplicate. All components are percent-decoded then re-encoded with a minimal consistent charset. Query parameters are dropped or sorted. Origins are lowercased.

Some URLs are extremely long, and you can run into rare limits like HTTP headers and database index page sizes.

Some URLs also have strange characters that you wouldn’t think would be in a URL, but will get rejected downstream by systems like PostgreSQL and SQS.”

Storage

At first, Wilson chose Oracle Cloud because of the low cost of transferring data out (egress costs).

He explained:

“I initially chose Oracle Cloud for infra needs due to their very low egress costs with 10 TB free per month. As I’d store terabytes of data, this was a good reassurance that if I ever needed to move or export data (e.g. processing, backups), I wouldn’t have a hole in my wallet. Their compute was also far cheaper than other clouds, while still being a reliable major provider.”

But the Oracle Cloud solution ran into scaling issues. So he moved the project over to PostgreSQL, experienced a different set of technical issues, and eventually landed on RocksDB, which worked well.

He explained:

“I opted for a fixed set of 64 RocksDB shards, which simplified operations and client routing, while providing enough distribution capacity for the foreseeable future.

…At its peak, this system could ingest 200K writes per second across thousands of clients (crawlers, parsers, vectorizers). Each web page not only consisted of raw source HTML, but also normalized data, contextualized chunks, hundreds of high dimensional embeddings, and lots of metadata.”

GPU

Wilson used GPU-powered inference to generate semantic vector embeddings from crawled web content using transformer models. He initially used OpenAI embeddings via API, but that became expensive as the project scaled. He then switched to a self-hosted inference solution using GPUs from a company called Runpod.

He explained:

“In search of the most cost effective scalable solution, I discovered Runpod, who offer high performance-per-dollar GPUs like the RTX 4090 at far cheaper per-hour rates than AWS and Lambda. These were operated from tier 3 DCs with stable fast networking and lots of reliable compute capacity.”

Lack Of SEO Spam

The software engineer claimed that his search engine had less search spam and used the example of the query “best programming blogs” to illustrate his point. He also pointed out that his search engine could understand complex queries and gave the example of inputting an entire paragraph of content and discovering interesting articles about the topics in the paragraph.

Four Takeaways

Wilson listed many discoveries, but here are four that may be of interest to digital marketers and publishers interested in this journey of creating a search engine:

1. The Size Of The Index Is Important

One of the most important takeaways Wilson learned from two months of building a search engine is that the size of the search index is important because in his words, “coverage defines quality.” This is

2. Crawling And Filtering Are Hardest Problems

Although crawling as much content as possible is important for surfacing useful content, Wilson also learned that filtering low quality content was difficult because it required balancing the need for quantity against the pointlessness of crawling a seemingly endless web of useless or junk content. He discovered that a way of filtering out the useless content was necessary.

This is actually the problem that Sergey Brin and Larry Page solved with Page Rank. Page Rank modeled user behavior, the choice and votes of humans who validate web pages with links. Although Page Rank is nearly 30 years old, the underlying intuition remains so relevant today that the AI search engine Perplexity uses a modified version of it for its own search engine.

3. Limitations Of Small-Scale Search Engines

Another takeaway he discovered is that there are limits to how successful a small independent search engine can be. Wilson cited the inability to crawl the entire web as a constraint which creates coverage gaps.

4. Judging trust and authenticity at scale is complex

Automatically determining originality, accuracy, and quality across unstructured data is non-trivial

Wilson writes:

“Determining authenticity, trust, originality, accuracy, and quality automatically is not trivial. …if I started over I would put more emphasis on researching and developing this aspect first.

Infamously, search engines use thousands of signals on ranking and filtering pages, but I believe newer transformer-based approaches towards content evaluation and link analysis should be simpler, cost effective, and more accurate.”

Interested in trying the search engine? You can find it here and you can read how the full technical details of how he did it here.

Featured Image by Shutterstock/Red Vector

https://www.searchenginejournal.com/tired-of-seo-spam-software-engineer-creates-a-new-search-engine/553994/

Google Answers Question About Core Web Vitals “Poisoning” via @sejournal, @martinibuster

Someone posted details of a novel negative SEO attack that they said appeared to be a Core Web Vitals performance poisoning attack. Google’s John Mueller and Chrome’s Barry Pollard assisted in figuring out what was going on.

The person posted on Bluesky, tagging Google’s John Mueller and Rick Viscomi, the latter a DevRel Engineer at Google.

They posted:

“Hey we’re seeing a weird type of negative SEO attack that looks like core web vitals performance poisoning, seeing it on multiple sites where it seems like an intentional render delay is being injected, see attached screenshot.Seeing across multiple sites & source countries

..this data is pulled by webvitals-js. At first I thought dodgy AI crawler but the traffic pattern is from multiple countries hitting the same set of pages and forging the referrer in many cases”

The significance of the reference to “webvitals-js” is that the degraded Core Web Vitals data is from what’s hitting the server, actual performances scores recorded on the website itself, not the CrUX data, which we’ll discuss next.

Could This Affect Rankings?

The person making the post did not say if the “attack” had impacted search rankings, although that is unlikely, given that website performance is a weak ranking factor and less important than things like content relevance to user queries.

Google’s John Mueller responded, sharing his opinion that it’s unlikely to cause an issue, and tagging Chrome Web Performance Developer Advocate Barry Pollard (@tunetheweb) in his response.

Mueller said:

“I can’t imagine that this would cause issues, but maybe @tunetheweb.com has seen things like this or would be keen on taking a look.”

Barry Pollard wondered if it’s a bug in the web-vitals library and asked the original poster if it’s reflected in the CrUX data (Chrome User Experience Report), which is a record of actual user visits to websites. The person responded by saying that the degradation in core web vitals scores is not reflected in the CrUX data.

DoS (Denial-Of-Service) attack

The person who posted about the issue also mentioned that the site under discussion was also experiencing a DoS attack.

They wrote:

“Hard to get a clear picture because the on top of the LCP issue the site is being hit with some kind of cache-bypass DOS attack that jacked up TTFB & has had the hosting maxxed out…”

They also stated that the website in question is experiencing a cache-bypass DoS (denial-of-service) attack, which is when an attacker sends a massive number of web page requests that bypass a CDN or a local cache, causing stress to server resources.

The method employed by a cache-bypass DoS attack is to bypass the cache (whether that’s a CDN or a local cache) in order to get the server to serve a web page (instead of a copy of it from the cache or CDN), thus slowing down the server.

The local web-vitals script is recording the performance degradation of those visits, but it is likely not registering with the CrUX data because that comes from actual Chrome browser users who have opted in to sharing their web performance data.

So What’s Going On?

Judging by the limited information in the discussion, it appears that a DoS attack is slowing down server response times, which in turn is affecting page speed metrics on the server. The Chrome User Experience Report (CrUX) data is not reflecting the degraded response times, which could be because the CDN is handling the page requests for the users recorded in CrUX. There’s a remote chance that the CrUX data isn’t fresh enough to reflect recent events but it seems logical that users are getting cached versions of the web page and thus not experiencing degraded performance.

I think the bottom line is that CWV scores themselves will not have an effect on rankings. Given that actual users themselves will hit the cache layer if there’s a CDN, the DoS attack probably won’t have an effect on rankings in an indirect way either.

Featured Image by Shutterstock/mentalmind

https://www.searchenginejournal.com/google-answers-question-about-core-web-vitals-poisoning-attack/553883/

Ex-Microsoft SEO Pioneer On Why AI’s Biggest Threat To SEO Isn’t What You Think via @sejournal, @theshelleywalsh

While industry professionals have debates over nomenclature of SEO, GEO, or AEO, and if ChatGPT or Google’s AI Overviews will replace traditional search, a more fundamental shift is happening that could disrupt the entire industry business model.

To get a better understanding of this, I spoke to the 25-year veteran and SEO pioneer Duane Forrester to discuss some of his recent articles about the shift from traditional SEO and the impact on how SEO roles are changing and adapting.

Duane previously worked at Microsoft as a senior program manager of SEO, where he helped to launch Bing Webmaster Tools and bring Schema.org to life. He has a deep understanding of how search engines work and has now turned his attention to adapting to the realities of AI-powered search and digital discovery.

His belief is that the real disruption isn’t AI replacing search engines; it’s the rise of AI agents. These “Agentic AI” systems will empower individuals to work like small agencies, and the jobs that thrive will be those that can effectively manage an AI team.

The Rise Of Agentic AI: Virtual Team Members

In Duane’s recent article “SEO’s Existential Threat is AI, but Not in the Way You Think,” he said it’s the rise of AI agents and retrieval-based systems that are already transforming how people interact with information, quietly eroding SEO’s return on investment. So, I asked him how agents and not SERPs are the future.

Duane explained:

“The most significant development isn’t AI replacing search engines; it’s the emergence of Agentic AI systems that can be given tasks and execute them autonomously … This is really a personal thing and I’ve been following this since I worked at Microsoft. I did some early work with Cortana with that program and training it for language recognition.”

Within six months, Duane predicts professionals will routinely instruct AI agents to perform work while they focus on higher-value activities. This is going to have the impact where individuals can behave much more like a small agency.

“If I can create a process and the process is largely executed by agents, then the 100% of my time that I can devote can be reapportioned to human-in-the-loop analysis.

This is going to be the way for us to create virtual players on our team and to do specific tasks to enable us to define the most valuable use of our time, whatever it happens to be. That valuable use of time for some people may be closing their next client. It may simply be the sales cycle. For other people who, maybe, lack knowledge and experience, it may actually be executing on what you promised the client.”

However, Dunae thinks that developing people management skills will be critical to success:

“If you step into the world of Agentic AI and you’re going down that path, you better have people management skills because you’re going to need them. That’s the skill set that will prove most valuable to managing Agentic AI work. You have to think of them not necessarily as humans, but as systems that need guidance.”

The Job Transformation: Writers As AI Instructors

I then asked Duane about his latest article, where he wrote about which SEO jobs AI will reshape and which might disappear.

He responded that the most dramatic changes will impact content creators, but not in the way many expect.

Duanes thinks that traditional writing roles face automation, but professionals who adapt will become more valuable than ever.

“If your full-time job is sitting down writing, that’s in jeopardy,” Duane acknowledges.

“The new model transforms writers from creators to instructors, managing multiple AI agents across different clients simultaneously. Instead of spending hours researching and writing, professionals can brief a dozen agents in minutes and focus on editing, refining tone, and ensuring accuracy.”

“You can tell a dozen agents for a dozen clients to all start and you can get them all started in less than two minutes and then in about 10 minutes have all of the output that you now will go in and edit one by one.”

Paradoxically, he thinks the role most in demand will be quality experienced writers, but only those who learn how to embrace and integrate AI to be efficient and effectively manage an AI team of writers.

By becoming a “human in the loop” editor who can guide AI output, an experienced writer can add value in ways machines can’t by refining tone, ensuring factual accuracy, and aligning copy with brand voice and client needs.

“I recently wrote about a Microsoft survey that showed the overlay of how AI can do a job versus humans doing that same job … their point was, if you’re in these jobs, you kind of want to figure out how to pivot to something different.”

Strategic Roles Remain Safe

The jobs that are vulnerable to AI are those with a repetitive nature that can be done by an AI faster, easier, and cheaper than a human.

While these execution-focused roles face disruption, strategic positions like CMOs remain relatively protected. These roles survive because they require experience-based decision-making that AI cannot replicate.

“It’s going to be harder to replace that level of experience because the system doesn’t have the experience,” Duane emphasizes.

The distinction isn’t about seniority but about the nature of the work. Repetitive tasks get automated first, while roles requiring strategic thinking, relationship building, and complex problem-solving remain human-dominated.

CMOs are considered “safe” not because they are senior, but because they are thinking in terms of strategy. They succeed by analyzing consumer behavior, identifying monetization opportunities, and aligning products with customer problems, capabilities that demand human insight and industry knowledge.

“They’re watching consumer behavior, and they’re trying to tease out from the consumer behavior: How do we make money from that? How do we align our product to solve a customer’s problem? And then that generates more sales. That’s the job of the CMO.

And then everything else under it, which is building and maintaining the team, running all the groups, and making sure everything is on track. It’s going to be harder to replace that level of experience because the system doesn’t have the experience.”

Preparing For The Future

Success in these evolving times requires immediate action on hiring and training. Companies must update job descriptions today to reflect skills needed in two to three years, or develop comprehensive training programs for existing staff.

“The people you’re hiring today, in theory, should still be with you in a couple of years. And if they are still with you in a couple of years and you don’t hire these new skills today, well then, you better have a training plan to get them there.”

I compared the current transformation with the early days of SEO, when pioneers navigated uncharted territory. Today’s professionals face a similar challenge of adapting to work alongside AI systems or risking obsolescence.

The future belongs to those who can embrace AI as a productivity multiplier rather than a replacement threat. Those who learn to instruct, guide, and optimize AI agents will find themselves more valuable than ever, while those who resist change may find themselves left behind.

“This isn’t just about surviving disruption,” Duane concluded. “It’s about positioning yourself to benefit from it.”

Watch the full video interview with Duane Forrester below.

[embedded content]

Duane is currently writing about the shift from traditional SEO to vector-driven retrieval and AI-generated answers at Duane Forrester Decodes and featured here at Search Engine Journal.

Thank you to Duane for offering his insights and being my guest on IMHO.

More Resources:

Featured Image: Shelley Walsh/Search Engine Journal

https://www.searchenginejournal.com/ex-microsoft-seo-pioneer-on-why-ais-biggest-threat-to-seo-isnt-what-you-think/553496/

Google Explains Why They Need To Control Ranking Signals via @sejournal, @martinibuster

Google’s Gary Illyes answered a question about why Google doesn’t use social sharing as a ranking factor, explaining that it’s about the inability to control certain kinds of external signals.

Kenichi Suzuki Interview With Gary Illyes

Kenichi Suzuki (LinkedIn profile), of Faber Company (LinkedIn profile), is a respected Japanese search marketing expert who has at least 25 years of experience in digital marketing. I last saw him speak at a Pubcon session a few years back, where he shared his findings on qualities inherent to sites that Google Discover tended to show.

Suzuki published an interview with Gary Illyes, where he asked a number of questions about SEO, including this one about SEO, social media, and Google ranking factors.

Gary Illyes is an Analyst at Google (LinkedIn profile) who has a history of giving straightforward answers that dispel SEO myths and sometimes startle, like the time recently when he said that links play less of a role in ranking than most SEOs tend to believe. Gary used to be a part of the web publishing community before working at Google, and he was even a member of the WebmasterWorld forums under the nickname Methode. So I think Gary knows what it’s like to be a part of the SEO community and how important good information is, and that’s reflected in the quality of answers he provides.

Are Social Media Shares Or Views Google Ranking Factors?

The question about social media and ranking factors was asked by Rio Ichikawa (LinkedIn profile), also of Faber Company. She asked Gary whether social media views and shares were ranking signals.

Gary’s answer was straightforward and with zero ambiguity. He said no. The interesting part of his answer was the explanation of why Google doesn’t use them and will never use them as a ranking factor.

Ichikawa asked the following question:

“All right then. The next question. So this is about the SEO and social media. Is the number of the views and shares on social media …used as one of the ranking signals for SEO or in general?”

Gary answered:

“For this we have basically a very old, very canned response and something that we learned or it’s based on something that we learned over the years, or particularly one incident around 2014.

The answer is no. And for the future is also likely no.

And that’s because we need to be able to control our own signals. And if we are looking at external signals, so for example, a social network’s signals, that’s not in our control.

So basically if someone on that social network decides to inflate the number, we don’t know if that inflation was legit or not, and we have no way knowing that.”

Easily Gamed Signals Are Unreliable For SEO

External signals that Google can’t control but can be influenced by an SEO are untrustworthy. Googlers have expressed similar opinions about other things that are easily manipulated and therefore unreliable as ranking signals.

Some SEOs might say, “If that’s true, then what about structured data? Those are under the control of SEOs, but Google uses them.”

Yes, Google uses structured data, but not as a ranking factor; they just make websites eligible for rich results. Additionally, stuffing structured data with content that’s not visible on the web page is a violation of Google’s guidelines and can lead to a manual action.

A recent example is the LLMs.txt protocol proposal, which is essentially dead in the water precisely because it is unreliable, in addition to being superfluous. Google’s John Mueller has said that the LLMs.txt protocol is unreliable because it could easily be misused to show highly optimized content for ranking purposes, and that it is analogous to the keywords meta tag, which was used by SEOs for every keyword they wanted their web pages to rank for.

Mueller said:

“To me, it’s comparable to the keywords meta tag – this is what a site-owner claims their site is about … (Is the site really like that? well, you can check it. At that point, why not just check the site directly?)”

The content within an LLMs.txt and associated files are completely in control of SEOs and web publishers, which makes them unreliable.

Another example is the author byline. Many SEOs promoted author bylines as a way to show “authority” and influence Google’s understanding of Expertise, Experience, Authoritativeness, and Trustworthiness. Some SEOs, predictably, invented fake LinkedIn profiles to link from their fake author bios in the belief that author bylines were a ranking signal. The irony is that the ease of abusing author bylines should have been reason enough for the average SEO to dismiss them as a ranking-related signal.

In my opinion, the key statement in Gary’s answer is this:

“…we need to be able to control our own signals.”

I think that the SEO community, moving forward, really needs to rethink some of the unconfirmed “ranking signals” they believe in, like brand mentions, and just move on to doing things that actually make a difference, like promoting websites and creating experiences that users love.

Watch the question and answer at about the ten minute mark:

[embedded content]

Featured Image by Shutterstock/pathdoc

https://www.searchenginejournal.com/google-explains-why-they-need-to-control-their-ranking-signals/553657/

Google Rolls Out ‘Preferred Sources’ For Top Stories In Search via @sejournal, @MattGSouthern

Google is rolling out a new setting that lets you pick which news outlets you want to see more often in Top Stories.

The feature, called Preferred Sources, is launching today in English in the United States and India, with broader availability in those markets over the next few days.

What’s Changing

Preferred Sources lets you choose one or more outlets that should appear more frequently when they have fresh, relevant coverage for your query.

Google will also show a dedicated From your sources section on the results page. You will still see reporting from other publications, so Top Stories remains a mix of outlets.

Google Product Manager Duncan Osborn says the goal is to help you “stay up to date on the latest content from the sites you follow and subscribe to.”

How To Turn It On

Image Credit: Google

Search for a topic that is in the news.
Tap the icon to the right of the Top stories header.
Search for and select the outlets you want to prioritize.
Refresh the results to see the updated mix.

You can update your selections at any time. If you previously opted in to the experiment through Labs, your saved sources will carry over.

In early testing through Labs, more than half of participants selected four or more sources. That suggests people value seeing a range of outlets while still leaning toward publications they trust.

Why It Matters

For publishers, Preferred Sources creates a direct way to encourage loyal readers to see more of your coverage in Search.

Loyal audiences are more likely to add your site as a preferred source, which can increase the likelihood of showing up for them when you have fresh, relevant reporting.

You can point your audience to the new setting and explain how to add your site to their list. Google has also published help resources for publishers that want to promote the feature to followers and subscribers.

This adds another personalization layer on top of the usual ranking factors. Google says you will still see a diversity of sources, and that outlets only appear more often when they have new, relevant content.

Looking Ahead

Preferred Sources fits into Google’s push to let you customize Search while keeping a variety of perspectives in Top Stories.

If you have a loyal readership, this feature is another reason to invest in retention and newsletters, and to make it easy for readers to follow your coverage on and off Search.

https://www.searchenginejournal.com/google-rolls-out-preferred-sources-for-top-stories-in-search/553529/

Google Says AI-Generated Content Will Not Cause Ranking Penalty via @sejournal, @martinibuster

Google’s Gary Illyes recently answered the question of whether AI-generated images used together with “legit” content can impact rankings. Gary discussed whether it had an impact on SEO and called attention to a technical issue involving server resources that is a possible outcome.

Does Google Penalize for AI-Generated Content?

How does Google react to AI image content when it’s encountered in the context of a web page? Google’s Gary Illyes answered that question within the context of a Q&A and offered some follow-up observations about how it could lead to extra traffic from Google Image Search. The question was asked at about the ten-minute mark of the interview conducted by Kenichi Suzuki and published on YouTube.

This is the question that was asked:

“Say if there’s a content that the content itself is legit, the sentences are legit but and also there are a lot of images which are relevant to the content itself, but all of them, let’s say all of them are generated by AI. Will that content or the overall site, is it going to be penalized or not?”

This is an important and reasonable question because Google ran an update about a year ago that appeared to de-rank low quality AI-generated content.

Google’s Gary Ilyes’ answer was clear that AI-generated content will not result in penalization and that it has no direct impact on SEO.

He answered:

“No, no. So AI generated image doesn’t impact the SEO. Not direct.

So obviously when you put images on your site, you will have to sacrifice some resources to those images… But otherwise you are not going to, I don’t think that you’re going to see any negative impact from that.

If anything, you might get some traffic out of image search or video search or whatever, but otherwise it should just be fine.”

AI-Generated Content

Gary Illyes did not discuss authenticity; however it’s a good thing to consider in the context of using AI-generated content. Authenticity is an important quality for users, especially in contexts where there is an expectation that an illustration is a faithful depiction of an actual outcome or product. For example, users expect product illustrations to accurately reflect the products they are purchasing and screenshots of food to reasonably represent the completed dishes after following the recipe instructions.

Google often says that content should be created for users and that many questions about SEO are adequately answered by the context of how users will react to it. Illyes did not reflect on any of that, but it is something that publishers should consider if they care about how content resonates with users.

Gary’s answer makes it clear that AI-generated content will not have a negative impact on SEO.

Watch the Q&A at about the 10 minute mark:

[embedded content]

Featured Image by Shutterstock/Besjunior

https://www.searchenginejournal.com/google-says-ai-generated-content-will-not-cause-ranking-penalty/553465/

Google Web Guide: How It’s Reshaping The SERP And What It Means For Your SEO Strategy via @sejournal, @cyberandy

For decades, the digital world has been defined by hyperlinks, a simple, powerful way to connect documents across a vast, unstructured library. Yet, the foundational vision for the web was always more ambitious.

It was a vision of a Semantic Web, a web where the relationships between concepts are as important as the links between pages, allowing machines to understand the context and meaning of information, not just index its text.

With its latest Search Labs experiment, Web Guide (that got me so excited), Google is taking an important step in this direction.

Google’s Web Guide is designed to make it easier to find the information, not just webpages. It is optimized as an alternative to AI Mode and AI Overview for tackling complex, multi-part questions or to explore a topic from multiple angles.

Built using a customized version of the Gemini AI model, Web Guide organizes search results into helpful, easy-to-browse groups.

This is a pivotal moment. It signals that the core infrastructure of search is now evolving to natively support the principle of semantic understanding.

Web Guide represents a shift away from a web of pages and average rankings and toward a web of understanding and hyper-personalization.

This article will deconstruct the technology behind Web Guide, analyzing its dual impact on publishers and refining a possibly new playbook for the era of SEO or Generative Engine Optimization (GEO) if you like.

I personally don’t see Web Guide as just another feature; I see it as a glimpse into the future of how knowledge shall be discovered and consumed.

How Google’s Web Guide Works: The Technology Behind The Hyper-Personalized SERP

At its surface, Google Web Guide is a visual redesign of the search results page. It replaces the traditional, linear list of “10 blue links” with a structured mosaic of thematic content.

For an exploratory search like [how to solo travel in Japan], a user might see distinct, expandable clusters for “comprehensive guides,” “personal experiences,” and “safety recommendations.”

This allows users to immediately drill down into the facet of their query that is most relevant to them.

But, the real revolution is happening behind the scenes. This curation is powered by a custom version of Google’s Gemini model, but the key to its effectiveness is a technique known as “query fan-out.”

When a user enters a query, the AI doesn’t just search for that exact phrase. Instead, it deconstructs the user’s likely intent into a series of implicit, more specific sub-queries, “fanning out” to search for them in parallel.

For the “solo travel in Japan” query, the fan-out might generate internal searches for “Japan travel safety for solo women,” “best blogs for Japan travel,” and “using the Japan Rail Pass.”

By casting this wider net, the AI gathers a richer, more diverse set of results. It then analyzes and organizes these results into the thematic clusters presented to the user. This is the engine of hyper-personalization.

The SERP is no longer a one-size-fits-all list; it’s a dynamically generated, personalized guide built to match the multiple, often unstated, intents of a specific user’s query. (Here is the early analysis I did by analyzing the network traffic – HAR file – behind a request.)

To visualize how this works in semantic terms, let’s consider the query “things to know about running on the beach,” which the AI breaks down into the following facets:

Screenshot from search for [things to know about running on the beach], Google, August 2025

Image from author, August 2025

The WebGuide UI is composed of several elements designed to provide a comprehensive and personalized experience:

Main Topic: The central theme or query that the user has entered.
Branches: The main categories of information generated in response to the user’s query. These branches are derived from various online sources to provide a well-rounded overview.
Sites: The specific websites from which the information is sourced. Each piece of information within the branches is attributed to its original source, including the entity name and a direct URL.

Let’s review Web Guide in the context of Google’s other AI initiatives.

Feature	Primary Function	Core Technology	Impact on Web Links
AI Overviews	Generate a direct, synthesized answer at the top of the SERP.	Generative AI, Retrieval-Augmented Generation.	High negative impact. Designed to reduce clicks by providing the answer directly. It is replacing featured snippets, as recently demonstrated by Sistrix for the UK market.
AI Mode	Provide a conversational, interactive, generative AI experience.	Custom version of Gemini, query fan-out, chat history.	High negative impact. Replaces traditional results with a generated response and mentions.
Web Guide	Organize and categorize traditional web link results.	Custom version of Gemini, query fan-out.	Moderate/Uncertain impact. Aims to guide clicks to more relevant sources.

Web Guide’s unique role is that of an AI-powered curator or librarian.

It adds a layer of AI organization while preserving the fundamental link-clicking experience, making it a strategically distinct and potentially less contentious implementation of AI in search.

The Publisher’s Conundrum: Threat Or Opportunity?

The central concern surrounding any AI-driven search feature is the potential for a severe loss of organic traffic, the economic lifeblood of most content creators. This anxiety is not speculative.

Cloudflare’s CEO has publicly criticized these moves as another step in “breaking publishers’ business models,” a sentiment that reflects deep apprehension across the digital content landscape.

This fear is contextualized by the well-documented impact of Web Guide’s sibling feature, AI Overviews.

A critical study by the Pew Research Center revealed that the presence of an AI summary at the top of a SERP dramatically reduces the likelihood that a user will click on an organic link, a nearly 50% relative drop in click-through rate in its analysis.

Google has mounted a vigorous defense, claiming it has “not observed significant drops in aggregate web traffic” and that the clicks that do come from pages with AI Overviews are of “higher quality.”

Amid this, Web Guide presents a more nuanced picture. There is a credible argument that, by preserving the link-clicking paradigm, it could be a more publisher-friendly application of AI.

Its “query fan-out” technique could benefit high-quality, specialized content that has struggled to rank for broad keywords.

In this optimistic view, Web Guide acts as a helpful librarian, guiding users to the right shelf in the library rather than just reading them a summary at the front desk.

However, even this more “link-friendly” approach cedes immense editorial control to an opaque algorithm, making the ultimate impact on net traffic uncertain to say the least.

The New Playbook: Building For The “Query Fan-Out”

The traditional goal of securing the No. 1 ranking for a specific keyword is rapidly becoming an outdated and insufficient goal.

In this new landscape, visibility is defined by contextual relevance and presence within AI-generated clusters. This requires a new strategic discipline: Generative Engine Optimization (GEO).

GEO expands the focus from optimizing for crawlers to optimizing for discoverability within AI-driven ecosystems.

The key to success in this new paradigm lies in understanding and aligning with the “query fan-out” mechanism.

Pillar 1: Build For The “Query Fan-Out” With Topical Authority

The most effective strategy is to pre-emptively build content that maps directly to the AI’s likely “fan-out” queries.

This means deconstructing your areas of expertise into core topics and constituent subtopics, and then building comprehensive content clusters that cover every facet of a subject.

This involves creating a central “pillar” page for a broad topic, which then links out to a “constellation” of highly detailed, dedicated articles that cover every conceivable sub-topic.

For “things to know about running on the beach,” (the example above) a publisher should create a central guide that links to individual, in-depth articles such as “The Benefits and Risks of Running on Wet vs. Dry Sand,” “What Shoes (If Any) Are Best for Beach Running?,” “Hydration and Sun Protection Tips for Beach Runners,” and “How to Improve Your Technique for Softer Surfaces.”

By creating and intelligently interlinking this content constellation, a publisher signals to the AI that their domain possesses comprehensive authority on the entire topic.

This dramatically increases the probability that when the AI “fans out” its queries, it will find multiple high-quality results from that single domain, making it a prime candidate to be featured across several of Web Guide’s curated clusters.

This strategy must be built upon Google’s established E-E-A-T (Experience, Expertise, Authoritativeness, and Trustworthiness) principles, which are amplified in an AI-driven environment.

Pillar 2: Master Technical & Semantic SEO For An AI Audience

While Google states there are no new technical requirements for AI features, the shift to AI curation elevates the importance of existing best practices.

Structured Data (Schema Markup): This is now more critical than ever. Structured data acts as a direct line of communication to AI models, explicitly defining the entities, properties, and relationships within your content. It makes content “AI-readable,” helping the system understand context with greater precision. This could mean the difference between being correctly identified as a “how-to guide” versus a “personal experience blog,” and thus being placed in the appropriate cluster.
Foundational Site Health: The AI model needs to see a page the same way a user does. A well-organized site architecture, with clean URL structures that group similar topics into directories, provides strong signals to the AI about your site’s topical structure. Crawlability, a good page experience, and mobile usability are essential prerequisites for competing effectively.
Write with semiotics in mind: As Gianluca Fiorelli would say, focus on the signals behind the message. AI systems now rely on hybrid chunking; they break content into meaning-rich segments that combine text, structure, visuals, and metadata. The clearer your semiotic signals (headings, entities, structured data, images, and relationships), the easier it is for AI to interpret the purpose and context of your content. In this AI-gated search environment, meaning and context have become your new keywords.

The Unseen Risks: Bias In The Black Box

A significant criticism of AI-driven systems like Web Guide lies in their inherent opacity. These “black boxes” pose a formidable challenge to accountability and fairness.

The criteria by which the Gemini model decides which categories to generate and which pages to include are not public, raising profound questions about the equity of the curation process.

There is a significant risk that the AI will not only reflect but also amplify existing societal and brand biases. A compelling example is to review complex issues to test the fairness of the Web Guide.

Screenshot from search for [Are women more likely to be prescribed antidepressants for physical symptoms?], Google, August 2025

Medical diagnostic queries are complex and can easily reveal biases.

Screenshot from search for [Will AI eliminate most white-collar jobs?], Google, July 2025

Once again, UGC is used and might not always bring the right nuance between doom narratives and overly optimistic positions.

Since the feature is built upon these same core systems of traditional Search, it is highly probable that it will perpetuate existing biases.

Conclusion: The Age Of The Semantic AI-Curated Web

Google’s Web Guide is not a temporary UI update; it is a manifestation of a deeper, irreversible transformation in information discovery.

It represents Google’s attempt to navigate the passage between the old world of the open, link-based web and the new world of generative, answer-based AI.

The “query fan-out” mechanism is the key to understanding its impact and the new strategic direction. For all stakeholders, adaptation is not optional.

The strategies that guaranteed success in the past are no longer sufficient. The core imperatives are clear: Embrace topical authority as a direct response to the AI’s mechanics, master the principles of Semantic SEO, and prioritize the diversification of traffic sources. The era of the 10 blue links is over.

The era of the AI-curated “chunks” has begun, and success will belong to those who build a deep, semantic repository of expertise that AI can reliably understand, trust, and surface.

More Resources:

Featured Image: NicoElNino/Shutterstock

https://www.searchenginejournal.com/google-web-guide-reshaping-the-serp-and-what-it-means-for-your-seo-strategy/552827/

Google Is Testing An AI-Powered Finance Page via @sejournal, @martinibuster

Google announced that they’re testing a new AI-powered Google Finance tool. The new tool enables users to ask natural language questions about finance and stocks, get real-time information about financial and cryptocurrency topics, and access new charting tools that visualize the data.

Three Ways To Access Data

Google’s AI finance page offers three ways to explore financial data:

Research
Charting Tools
Real-Time Data And News

Screenshot Of Google Finance

The screenshot above shows a watchlist panel on the left, a chart in the middle, a “latest updates” section beneath that, and a “research” section on the right hand panel.

Research

The new finance page enables users to ask natural language questions about finance, including the stock market, and the AI will return comprehensive answers, plus links to the websites where the relevant answers can be found.

Closeup Screenshot Of Research Section

Charting Tools

Google’s finance page also features charting tools that enable users to visualize financial data.

According to Google:

“New, powerful charting tools will help you visualize financial data beyond simple asset performance. You can view technical indicators, like moving average envelopes, or adjust the display to see candlestick charts and more.”

Real-Time Data

The new finance page also provides real-time data and tools, enabling users to explore finance news, including cryptocurrency information. This part features a live news feed.

The AI-powered page will roll out over the next few weeks on Google.com/finance/.

Read more at Google:

We’re testing a new, AI-powered Google Finance.

Featured Image by Shutterstock/robert_s

https://www.searchenginejournal.com/google-is-testing-an-ai-powered-finance-page/553333/

How To Stay Visible in AI Search [Webinar] via @sejournal, @lorenbaker

AI search is here. Are you ready for the new rules?

The SEO game has changed. Traditional strategies are no longer enough, and some brands are getting lost in the shift to AI-powered search results.

Join Wayne Cichanski on August 20, 2025 for an exclusive webinar sponsored by iQuanti. Learn how to adapt your SEO strategy and site architecture for AI-driven queries and remain competitive in this new search era.

In this session, you’ll discover:

Why user experience, schema, and site architecture are now just as important as keywords
Practical steps to remain visible and competitive in evolving search results
How to position your brand for discovery in AI-driven queries, not just rankings

Why this session is essential:

With generative AI reshaping search results across platforms like Google, Bing, and ChatGPT, it is crucial to rethink how your content is structured and how people interact with your brand in AI search. Do not get left behind. Optimize for AI-driven search now.

Register today for actionable insights and a roadmap to success in the AI search era. If you cannot attend live, do not worry. Sign up anyway and we will send you the full recording.

https://www.searchenginejournal.com/stay-visible-ai-search-iquanti-webinar/552675/

The Future Of Search: 5 Key Findings On What Buyers Really Want via @sejournal, @MattGSouthern

Search is changing, and not just because of Google updates.

Buyers are changing how they find, evaluate, and decide. They are researching in AI summaries, asking questions out loud to their phones, and converting through conversations that happen outside of what most analytics can track.

Our latest ebook, “The Future Of Search: 16 Actionable Pivots That Improve Visibility & Conversions,” explores how marketers are responding to this shift.

It offers a closer look at what it means to optimize for visibility, engagement, and results in a fragmented, AI-influenced search landscape.

Here are five key takeaways.

1. Ranking Well Doesn’t Guarantee Visibility

Getting to the top of search results used to be enough. Today, that’s no longer the case.

AI summaries, voice assistants, and platform-native answers often intercept the buyer before they reach your website.

Even high-ranking content can go unseen if it’s not structured in a way that’s easily digestible by large language models.

For example, research shows AI-generated summaries often prioritize single-sentence answers and structured formats like tables and lists.

Only a small fraction of AI citations rely on exact-match keywords, reinforcing that clarity and context are now more important than repetition.

To stay visible, businesses need to consider how their content is interpreted across multiple AI systems, not just traditional SERPs.

2. Many Conversions Happen Offscreen

Clicks and page views only tell part of the story.

High-intent actions like phone calls, text messages, and offline conversations are often left out of attribution models, yet they play a critical role in decision-making.

These touchpoints are especially common in service-based industries and B2B scenarios where buyers want real interaction.

One case study reveals that a company discovered nearly 90% of their Yelp conversions came through phone calls they weren’t tracking. Another saw appointment bookings spike after attributing organic search traffic to calls rather than clicks.

Our ebook refers to this as the insight gap, and highlights how conversation tracking helps marketers close it.

3. Listening Is More Effective Than Guessing

Marketers have access to more customer input than ever, but much of it goes unused.

Call transcripts, support calls, and chat logs contain the language buyers actually use.

Teams that analyze these conversations are gaining an edge, using real voice-of-customer insights to refine messaging, improve landing pages, and inform campaign strategy.

In one example, a marketing agency increased qualified leads by 67% simply by identifying the specific terminology customers used when asking about their services.

The shift from assumptions to evidence is helping brands prioritize what matters most, and it’s making their campaigns more effective.

4. Paid Search Works Better When It Aligns With Everything Else

Search behavior is not linear, and neither is the buyer journey.

Users often move between organic results, paid ads, and AI-generated suggestions in the same session. The strongest-performing campaigns tend to be the ones that echo the same language and value props across all these touchpoints.

That includes aligning ad copy with real customer concerns, drawing from call transcripts, and building landing pages that reflect the buyer’s stage in the decision process.

It also means rethinking what happens after the click.

5. Attribution Models Are Out Of Step With Reality

Most attribution still assumes that conversions happen on a single screen. That’s rarely true.

A manager might discover your brand in an AI-generated search snippet on a desktop, send the link to themselves in Slack, and later call your sales team from their iPhone after revisiting the content on mobile.

Marketers relying only on last-click attribution may be optimizing based on incomplete or misleading data.

The report makes the case for models that include multi-touch, cross-device, and offline activity to give a fuller picture of what drives conversions.

This isn’t about tracking more for the sake of it. It’s about making smarter decisions with the signals that matter.

Rethinking Search Starts With Rethinking Buyers

The ebook, written in collaboration with CallRail, offers more than strategy updates. It is a reminder that behind every metric is a person making a decision.

Marketers who succeed in this new environment aren’t just optimizing for rankings or clicks. They are optimizing for how people think, search, and take action.

Download the full report to explore how buyer behavior is reshaping search strategy.

Featured Image: innni/Shutterstock

https://www.searchenginejournal.com/the-future-of-search-key-findings-on-what-buyers-really-want/552476/