First of all, Googlebot is now regularly updated to use the latest Chromium engine for rendering. Considering that before this update, Googlebot was using a 4-year old version of the rendering engine, it’s truly a massive step forward.
Secondly, it was announced during the Chrome Dev Summit in November 2019 that the delay between crawling and rendering a page by Googlebot is now 5 seconds on the median. Last year Googlers declared that it could take up to a week.
So it seems that we need to reinterpret the definition of the crawl budget.
For years, optimizing the crawl budget meant, for the most part, dealing with index bloat.
Particularly when it comes to large websites, Googlebot is often forced to crawl through tons of worthless content to find the most important pages.
When an e-commerce website’s navigation is based on adding parameters to URLs, a couple of dozen product category pages can turn into thousands of duplicate URLs for Googlebot to crawl.
To mitigate this, SEOs work with business owners to make sure that all the important content is indexed.
This work involves the crawl rate limit and the crawl demand.
Crawl Rate Limit
This factor primarily depends on a given website’s server health. Since “Googlebot is designed to be a good citizen of the web,” it adjusts its crawling rate based on how the server reacts to continuous requests.
To avoid crashing the server and ruining the experience of users visiting the website, it will limit the crawling rate when the server responds poorly. Similarly, the crawling rate will go up if the server has no problem handling intense robot activity.
Alternatively, webmasters can manually decrease the crawling rate using Search Console.
The takeaway: Make your server and your page as fast and reliable as possible, and Googlebot will be able to crawl as much as it deems necessary.
Crawl demand is the priority that Google assigns to crawling a given website. The two main factors influencing the crawl demand are popularity and staleness.
Here’s how Gary Ilyes from the Google Search team defined those:
Straightforward, right? Well, the SEO community picked these two points apart and came up with several different interpretations of what Google could have actually meant.
To give you an example, Gary Ilyes stated that popular URLs tend to get crawled more often. Some people assumed that popularity reflects the traffic volume. Others concluded (with some prehistoric evidence) that a page’s popularity is reflected by internal and external linking as well as the number of keywords a given URL is ranking for.
It’s also unclear what Google exactly meant by saying, “our systems attempt to prevent URLs from becoming stale in the index.” Do they factor in the time since the URL was last crawled, or do they have a way of predicting changes in the content? Maybe both of these are used among many other variables…
Every content creator is led to believe that creating great content offering users what they truly need is the main prerequisite to do well in search. And it’s true, for the most part.
But in some cases, it may not be enough. Even if you did your best to provide content that’s optimized in every way and you expect it to gain traction, search engines may see it differently. Not because they aren’t fond of your writing, but because they may not even see your content.
The only way to get some organic traffic is to get your pages crawled and indexed, and as we’ve observed, you can never be completely sure as to when that will happen.
While some don’t really care if they appear in the search results only after a couple of hours, for others, being 30 minutes late is giving all the potential traffic to a faster competitor.
In light of this, it seems that the definition of the crawl budget should be expanded.
Why is this an issue?
And if it isn’t, the website may become trapped in the vicious cycle of the low crawl budget:
- Googlebot only discovers parts of the page.
- Google assumes that the page is of low quality.
- The crawl budget gets further lowered.
- And back again…
Google’s official definition of the crawl budget – “the number of URLs Googlebot can and wants to crawl” – is still perfectly valid. However, for SEOs, the crawl budget has always been a working definition rather than a concrete metric – the aggregate of issues that influence how often and how thoroughly a website is crawled by the search engines.