Is Google’s Two Waves of Indexing Over?

Is-Googles-Two-Waves-of-Indexing-Over - 0.-Is-Two-Waves-Indexing-Over-header

Last week I was invited to Google Zürich to meet with Googlers John Mueller and Martin Splitt and discuss JavaScript SEO.  Not only was it a fantastic experience, I walked away with a whole new understanding of how Google renders and indexes websites. Here are my three main takeaways from the discussion: 

Want to know more about what Bartosz learned in his conversation with John Mueller and Martin Splitt? Then check out “JavaScript SEO is Dead, Long Live JavaScript SEO!


JavaScript_SEO_is_Dead_Long_Live_JavaScript_SEO - 0.-JS-SEO-is-Dead-JS-SEO-is-not-dead-Header

This is the relevant transcript from the Google Webmaster Central Office-hours Hangout, as recorded on August 23, 2019. This hangout was hosted by Googlers John Mueller and Martin Splitt from Google Zürich. Onely’s CEO Bartosz Goralewicz was the invited guest.  

Bartosz Goralewicz: What’s the factor, what’s the metric that [decides], “OK, this website goes into two waves, and this one isn’t”? So we see that quite a lot where JavaScript websites are not [indexed] within two waves. Maybe there’s not enough JavaScript or whatever. Actually, we have quite a lot of bots recently to play with that because we’re investigating that.

Martin Splitt: Do you want to answer that? Or should I answer that?

John Mueller: I can hand wave.


MS: These days two-wave indexing – or the two waves of indexing – play less and less of a role. So, basically, generally speaking, you may see a lot of websites that are not using JavaScript that are still going through basically two waves. And you might see some, you might see –

BG: Wait, wait, wait, wait. Explain that. 

MS: Right. OK, so how do I put this?

BG: So is it like heavy CSS or –

MS: No, so here’s the thing. Pretty much every website, when we see them for the first time, goes to rendering. So there’s no indexing before it hasn’t been rendered. And, there are certain heuristics, that, if we see after a while, like, oh, this page, actually, the renderer does not diff as much or doesn’t diff, it looks the way before, like, we get  – so what happens –

BG: So we get a new domain –

MS:  Right, we do a crawl, right, which means, yeah, let’s say you get a new domain –

BG: You learn how much CPU this new domain was –

MS: No.

BG: – taking – 

MS: No, that’s not what we do. What we do is we do an HTTP request, and we get something back, right – some HTML, maybe it’s a barebone HTML and all it does is load the JavaScript and run the JavaScript. Then, this HTML that we got from the original HTTP GET request from the crawl, goes into rendering. Rendering runs JavaScript – boom! – a lot of content happens that wasn’t there before – so we’re like, aha! OK, so this needs to be rendered. BUT there is a heuristics that is very, very –

BG: So, you look at the difference between the initial HTML, and, then, if after rendering you see extra content?

MS: Yeah. And the interesting thing is that, so, what I want to make very, very clear because I talked to the team and I was surprised about this. I thought this was a lot more, this is still a lot, like, more frequently happening that we are going like: ”Oh, all right, we are gonna skip rendering.” It is not as frequently happening anymore. So, like, for many, many websites even if they do not run JavaScript, they might still go through the render phase, because it doesn’t make a difference as much.

JM: Because it’s cheap for us.

MS: It’s cheap. It’s cheaper than the complexity that we infer, so, like, there’s very, very few cases, and the internals of that are very complicated, and I still haven’t fully, like, grasped what exactly triggers the heuristics –

BG: Because what we see is that there is quite a lot of JavaScript websites that never go through, like, two waves. And, there are some websites that go through two waves, like we, again, we don’t see really the difference. So one of the factors for you is, like, the difference between the –

MS: – initial crawl –

BG: – initial HTML, whatever, and then the rendered DOM. 

MS: Yeah, crawled DOM and rendered DOM.

BG: OK, that’s interesting.

MS: And I wouldn’t say that two waves of indexing are dead, but it’s definitely something that –

BG: Oh, they’re definitely not.

MS: They’re absolutely not, but it’s definitely – I expect, eventually rendering, crawling and indexing will come closer together. We’re not there yet, but I know the teams are looking into it. 

BG: So you would –

MS: No plans, no deadlines, no road maps to be announced yet. But –

BG: You winked twice!

Here is the whole video: