“Pretty much everyone knows we’re using clicks in rankings. That’s the debate: ‘Why are you trying to obscure this issue if everyone knows?’”
That quote comes from Eric Lehman, a former 17-year employee of Google who worked as a software engineer on search quality and ranking. He left Google in November.
Lehman testified last Wednesday as part of the ongoing U.S. vs. Google antitrust trial.
If you haven’t heard this quote yet, expect to hear it. A lot.
But. That’s not all Lehman had to say. Google’s machine learning systems BERT and MUM are becoming more important than user data, he said.
- “In one direction, it’s better to have more user data, but new technology and later systems can use less user data. It’s changing pretty fast,” Lehman said, as reported by Law360.
Lehman believes Google will rely more heavily on machine learning to evaluate text than user data, according to an email Lehman wrote in 2018, as reported by Fortune:
- “Huge amounts of user feedback can be largely replaced by unsupervised learning of raw text,” he wrote.
User vs. training data. There was also a confusion around “user data” vs. training data” when it came to BERT. Big Tech on Trial reported:
“DOJ’s attempt to impeach Lehman’s testimony also seemed to backfire. In response to a DOJ question about whether Google had an advantage in using BERT over competition because of its user data, Lehman testified that Google’s ‘biggest advantage in using BERT’ over its competitors was that Google invented BERT. DOJ then put up an exhibit titled ‘Bullet points for presentation to Sundar.’ One of the bullets on this exhibit said the following (according to my notes): ‘Any competitor can use BERT or similar technologies. Fortunately, our training data gives us a head-start. We have the opportunity to maintain and extend our lead by fully using the training data with BERT and serving it to our users…’
This likely would have been an effective impeachment of Lehman if “training data” meant some kind of user data. But after DOJ concluded its re-direct examination, Judge Mehta asked Lehman what “training data” referred to. Lehman explained it was different from user search data.”
Sensitive Topics. Lehman was also asked by DOJ attorney Erin Murdock-Park about a slide from one of his slide decks on “Sensitive Topics” that instructed employees to “not discuss the use of clicks in search…”
According to reporting from Big Tech on Trial (via X), Lehman said “we try to avoid confirming that we use user data in the ranking of search results.”
The reporter X post says “I didn’t get great notes on this, but I think the reason had something to do with not wanting people to think that SEO could be used to manipulate search results.”
Google = liars? Since discovering this testimony, SEOs have been quick to use Lehman’s quotes as definitive proof that Google has been lying about using clicks or click-through rate for all of its 25 years.
The question of whether Google uses clicks was the first question asked last week during an AMA with Google’s Gary Illyes at Pubcon Pro in Austin. Illyes answer was “technically, yes,” because Google uses historical search data for its machine-learning algorithm RankBrain.
Technically yes, translated from Googler speak, means yes. RankBrain was trained on user search data.
We know this because Illyes already told us this in 2018. He said RankBrain “uses historical search data to predict what would a user most likely click on for a previously unseen query.”
RankBrain was used for all searches, impacting “lots” of them, starting in 2016.
Google Search tracks everything. But the fact that Google tracks clicks in Search does not mean clicks are used as a direct ranking factor. In other words, if site A gets 100 clicks and site B gets 101 clicks, then site B automatically jumps up to Position 1.
Much like how Google uses its people to rate the quality of its search results, Google is likely using clicks to rate the results for queries and train its ranking systems.
Why we care. Does Google use clicks? Yes. But again, probably not as a ranking signal (thought admittedly I can’t say that with 100% certainty as I don’t work at Google or have access to the algorithm). I know clicks are noisy and easy to manipulate. And for many sites/queries, there simply wouldn’t be enough data to evaluate to make it a useful ranking signal for Google.
Dig deeper. The biggest mystery of Google’s algorithm: Everything ever said about clicks, CTR and bounce rate
Related stories
New on Search Engine Land
https://searchengineland.com/former-googler-google-using-clicks-in-rankings-432401