This is a summary of the most interesting questions and answers from the Google SEO Office Hours with John Mueller on February 25th, 2022.
Links report in Search Console
05:41 “There was a domain where there used to be a website on it, and then […] it was deleted. [If] there’s a new website at some point, then the links to the old website don’t count anymore, which seems pretty logical. […] In Search Console, I do see at least one link from the former owner that’s still there. Does [it] mean that this link would still count [if] Search Console […] show it?”
John replied, “I don’t know if it would count, but the important part with Search Console and the Links report there is we try to show all the links that we know of to that site. It’s not a sign that we think these are important links or that they count. In particular, things like nofollow links would still be listed, disavow links would still be listed, links that we ignore for other reasons could still be listed as well. So just because it’s listed doesn’t mean it’s an irrelevant or helpful link for the site.”
09:13 “We acquired a site that came with plenty of internal links in the footer section of each page. Some of these links are not as relevant. […] Is it likely to be seen as problematic by Google because the links are not necessarily contextual? […] They’re just jumbled together in the footer, and they’re just selected by a plugin. We’re scared that removing them from hundreds of pages might mess up the site structure.”
John said, “I suspect for the most part that wouldn’t cause any problems. I would see this more as these links on these pages are normal internal links.
Look at it from the point of view: does this help […] give context to the rest of the pages on the website? For example, if you have a larger website and essentially every page is linked with every other page, there’s no real context there, so it’s hard for us to understand what the overall structure is, which of these pages [are] more important. Because if you’re linking to everything, then it’s like everything is not important. That’s the element that I would watch out for. Whether or not they’re in the footer, that, from my point of view, is irrelevant. If they’re generated by a plugin or added manually, I don’t think that matters either. I would watch out from a structural point of view: does it make sense to have these links? Some amount of cross-linking makes sense. If you have a setup where you have related pages that are cross-linked, that, from my point of view, always makes sense. Extreme cross-linking where you’re cross-linking every page with every other page, from my point of view, it doesn’t make sense.”
11:10 “[…] If we disable the plugin and all these links suddenly […] miss out from the page, would it affect the website in any way? Or should we try to slowly remove links from one page at a time?”
John: “My guess is it would affect how it’s shown in search, but it’s impossible to say will it be a positive effect or a negative effect. I think that’s the tricky part there.
What I would do to try to understand what the current situation is and what the next step would be is to run a crawler over your website. A lot of the website crawlers that are out there will generate a graph of how your pages are linked together. Then you could disable the plugin, maybe on a staging version of your website, maybe even on the live one for a short period of time. Crawl again and then compare those graphs and see, is this crawler still able to find all of the content? Does it look like there’s insufficient cross-linking there? If so, then that gives you a little bit more trust that just disabling the plugin will be okay. From our point of view, it doesn’t matter if these links are automatically placed or placed by a plugin or placed by machine learning. […] They’re just links that we find on your website.”
13:22 “Is all hidden text against Webmaster guidelines? […] We have some elements we include across multiple pages along with internally assigned identifiers for each element, so those identifiers don’t mean anything to users […], but as an SEO, it can make my life a lot easier. […]”
John’s answer was: “I don’t think that would be problematic. Hidden text, from our point of view, is more problematic when it’s really about deceiving the search engines with regards to what is actually on a page. So the extreme example would be that you have a page about shoes and there’s a lot of hidden text there that is about the Olympics […], and then suddenly your shoe page starts ranking for these Olympic terms. But, when a user goes, there’s nothing about the Olympics, and that would be problematic from our point of view.
I think we do a reasonable job in recognizing hidden text and trying to avoid that from happening, but that’s the reason we have this element in the Webmaster guidelines. Using it to display something that you don’t necessarily want to deceive anyone for that, from my point of view, is perfectly fine. Also, accessibility is a common reason for hidden text on a page where you have the tap targets set up that way that you can tap to them, and then your screen reader will say something about that. But they’re not visible directly on the page, and that’s also perfectly fine, so I wouldn’t have any fear or any doubt that that setup you described would work.”
The importance of image file names for site ranking
24:56 “We are using an intelligent CDN provider which has been replacing the image file names with unique numbers. We have noticed all the images are 404s in the Search Console. Disabling the CDN would significantly degrade the overall site performance. Will image alt text and captions be sufficient for Google to understand without an appropriate file name, title?”
According to John, “There are two things here that I would look at. On the one hand, if these are images that you need to have indexed in image search, then you should make sure that you have a stable file name for your images. That’s the most important element here.
You don’t mention that these numbers or these URLs change, but sometimes these CDNs essentially provide a session-based ID for each image. If the image URL changes every time we crawl, then essentially, we’ll never be able to index those images properly. This is mostly because for images we tend to be a little bit slower with regards to crawling and indexing. So if we see an image once and we say we should take a look at this, and we try to crawl it again at some later stage, and the number has changed by then, then we will drop that image from our search results, from the image rankings. Essentially we’ll say, well, this image that we thought was here is no longer here anymore. The most important part here is figuring out do you care about image search? If so, you need to make sure that you have a stable URL for all of these images. It doesn’t matter if it’s a number or if it’s a text or anything like that. It should just be stable. That’s the most important part here.
The other part that you mentioned is the image alt text and the captions, which suggests that you’re interested in web search, not necessarily image search. For web search, we don’t need to be able to crawl and index the images because we essentially look at the web pages themselves. So things like the alt text, any captions, headings on the page, all of that play into understanding this page a little bit better. For web search, that’s all we need. If all of the images were, for example, 404 all the time or blocked by robots.txt or web search, we would still treat that page exactly the same as if we were able to index all of those images. In image search itself, that’s where we need to be able to index these images and understand that there [are] stable URLs and understand how they connect with the rest of your site.”
Targeting two different pages with the same keyword
29:36 “One [page] is a feature page, and the other is an informational piece about that feature. Is it okay to target the same main keyword on those two different pages?”
John said, “First of all, it’s totally okay to target whatever keywords that you want. From our point of view, we’re not going to hold you back.
The thing I would watch out for is like are you competing with yourself? And that’s almost more of a strategic question rather than a pure SEO question and not something where we’d say there are guidelines that you should not do this. But it’s more that if you have multiple pieces of content that are ranking for the same query with the same intent, then you’re essentially diluting the value of the content that you’re providing across multiple pages. They’re competing with each other, and that could mean that these individual pages themselves are not that strong when it comes to competing with other people’s websites. So that sometimes is what I would watch out for.
If you have two pages and they’re both targeting the same keyword, and they have very different intents, then, from my point of view, that seems reasonable because people might be searching for that keyword with extra text added for one intent and extra text added for the other intent. And they’re essentially unique pages. It can make sense for both of them to appear in search or the best matching one to appear in search. […] And like I said, it’s not something that we require or that is okay or not okay. It’s a matter of your strategic positioning on how you want to appear in search.”
Is there a good ratio for indexed vs. non-indexed pages?
31:26 “Are positions of high-traffic ranking pages hurt by many, let’s say 50%, of total pages on a domain not being indexed or being indexed but not receiving traffic?”
John replied: “I guess the question is more around I have some set of pages that are very popular and a lot of pages that are not very popular. And that describes the average website where you have a variety of content, and some of it is very popular, and some of it isn’t that popular. So, from our point of view, that’s perfectly fine.
Also, just the bulk number of pages is a misleading metric because it’s easy to have a lot of pages that are not being seen is very important, and then they don’t appear a lot in search, and that can be perfectly fine. If you have five pages like that or a hundred […] or thousand pages, [and] if they’re not showing up in search, they’re not causing any problems either. From that point of view, it’s sometimes tricky to look at the bulk number of pages versus the pages that are being shown in the search results.
The other thing, maybe also to keep in mind with a question like this, is sometimes it does make sense to concentrate more on fewer pages to get that strategic advantage of having fewer pages that are stronger, rather than having a lot of pages where you’re essentially diluting the value. Like if you have a thousand pages and they each provide a small tidbit of the bigger picture, then those thousand pages are probably going to have a hard time in search. Whereas if you can combine a lot of that into […] maybe ten pages, then those ten pages will have a lot of information on them and maybe a lot of value relative to the rest of your site and maybe have it easier to rank for broader search terms around that topic.”
Search results within websites vs. ranking
37:49 “I’m trying to make sure that our SEO rankings don’t take a hit while we roll out a new search results page. […] Our searches can result in 10,000 results and have filtering and sorting functionalities. How does Google treat these search result pages within websites [and] how do these search results affect the overall ranking for the website? Is it enough to just be submitting sitemaps for ranking, or should we take additional considerations to help the Googlebot to gather reachable URLs?”
John replied: “[…] I would not rely on sitemaps to find all of the pages of your website. Sitemaps should be a way to give additional information about your website. It should not be the primary way of giving information [about] your website. So, in particular, internal linking is super important and something you should watch out for and make sure that however you set things up when someone crawls your website, they’re able to find all of your content and not that they rely on the sitemap file to get all of these things.
From that point of view, being able to go to these category pages and being able to find all the products that are in individual categories, I think, is super useful. Being able to crawl through the category pages to the product is also very important. Search results pages are a little bit of a unique area because some sites use category pages essentially like search results pages, and then you’re in that situation where search results pages are essentially like category pages. If that’s the case for you, I would watch out for everything that you would do with category pages.
The other thing with search results pages is that people can enter anything and search for something and your site has to do all the work to generate all of these things. That can easily result in an infinite number of URLs that are theoretically findable on your website because people can search in a lot of different ways. And because that creates this set of infinite pages on your website, that’s something that we try to discourage where we’d say either set these search results pages to noindex or use robots.txt to block crawling of these search results pages so that we can focus on the normal site structure and the normal internal linking. I think those are the primary aspects there.
If you do want to have your search results pages indexed, then my tip would be to make sure that, on the one hand, you have one primary sort order and filtering setup set up as a canonical. So if you choose to provide your pages by relevance, then if you have a sort filter for by price up or down, then I would set the rel=”canonical” of those filters to your primary sort order. Similarly, for filtering, I would perhaps remove the filter with the rel=”canonical.” Doing this, make sure that we can more focus on the primary version of the pages and crawl those properly, rather than that we get distracted by all of these variations of the search results pages.
The other thing I would watch out for is that you create some kind of an allow list or […] a system on your site with regards to the type of search queries that you want to allow to be indexed or crawled. For example, if someone goes to your website and searches for “Canadian pharmaceuticals” or something like that, and you’re not a pharmaceutical website, you probably don’t want that search page to be indexed. Even if you don’t have any products that are available that match that query, you probably don’t want to have that indexed. So having a list of the allowed searches that you do allow to have indexed makes that a lot easier. Make sure that you don’t accidentally run into this spam situation where someone is spamming your search results, and then you have to clean up millions of pages that are indexed and get rid of them somehow.”
Page Experience update on desktop vs. ranking
42:20 “My website had a drop in visitors due to poor Core Web Vitals. Now I am back on track but came to know that the Page Experience update is now slowly rolling out for desktop. What is the Page Experience ranking to desktop and how important is it, compared to other ranking signals?”
According to John, “Like on mobile, the Page Experience ranking factor is essentially something that gives us a little bit extra information [about] these different pages could show up in the search results. In situations where we have a […] clear intent from the query where we can understand that they really want to go this with this website, then, from that point of view, we can ease off on using Page Experience as a ranking factor. On the other hand, if all of the content is very similar on the search results page, then probably using Page Experience helps a little bit to understand which of these are fast pages or reasonable pages with regards to the user experience and which of these are the less reasonable pages to show in the search results. That situation helps us there.
With regards to the desktop rollout, I believe this is going to be a slower rollout again over the course of something like a month, which means you would not be seeing a strong effect from one day to the next, but rather you would see that effect over a period of time. You would also already see that in Search Console in the reports for Page Experience and Core Web Vitals. You would also already see that on desktop that everything is red, for example, and that you need to focus on that. From that point of view, with the desktop ranking change like with the mobile one, I wouldn’t expect a drastic jump in the search results from one day to the next as we roll this out. At most, if things are really bad for your website, you would see a gradual drop there.”
Non-indexed translated content
53:15 “I work on a large multilingual site. In April last year, […] all of our translated content moved from Valid to Excluded, Crawled – currently not indexed, and it has stayed since April. […] Because it happened all at once, we thought maybe there was some systemic change on our side. […] We’ve cleaned up our hreflangs, canonicals, URLs parameters, manual actions, and every other tool that’s listed on developers.google.com/search. […] I don’t know what’s happened or what to do next to try to fix the issue, but I’d like to get our translated content back in the index.”
John’s answer was: “[…] I think [it] is sometimes tricky [if] you have the parameter at the end with the language code. […] From our point of view, what can happen is that when we recognize that there are a lot of these parameters there that lead to the same content, then our systems can get stuck into a situation, well, maybe this parameter is not very useful, and we should ignore it. To me, it sounds a lot like something around that line happened. Partially, you can help this with the URL Parameters tool in Search Console to make sure that that parameter is set I do want to have everything indexed. Partially, what you could also do is maybe to crawl a portion of your website with […] local crawler to see what parameter URLs actually get picked up and then double-check that those pages actually have useful content for those languages.
In particular, […] common [thing] that I’ve seen on sites is maybe you have […] all languages linked up, and the Japanese version says oh we don’t have a Japanese version, here’s our English one instead, then our systems could say, well, the Japanese version is the same as the English version ‒ maybe there are some other languages the same as the English version. We should ignore that. Sometimes this is from links within the website, sometimes it’s also external links, people who are linking to your site. If the parameter is at the end of your URL, then it’s very common that there’s some garbage attached to the parameter as well. And if we crawl all of those URLs with that garbage and we say, well, this is not a valid language here’s the English version, then it again reinforces that loop where systems say, well, maybe this parameter is not so useful. The cleaner approach there would be if you have kind of garbage parameters to redirect to the cleaner ones or to maybe even show a 404 page and say, well, we don’t know what you’re talking about with this URL and to cleanly make sure that whichever URLs we find we get some useful content that is not the same as other content which we’ve already seen.”