Solve the index coverage problem

22 February، 2022 219 مشاهدة

Solve the index coverage problem

Having problems indexing in Google? This issue may result in lower traffic and conversion rates. It is necessary to check the indexed and non-indexed pages of your site to quickly solve any problem. Here we explain how to do it step by step using Google Search Console Index Coverage Report. Using the following method, we were able to fix the index coverage issue on hundreds of sites with millions or billions of excluded pages. Use it so none of your relevant pages lose visibility in search results and boost your SEO traffic.

Step 1: Check Index Coverage Report

The Search Console Coverage report tells you which pages have been crawled and indexed by Google, and why the URLs are in that specific state. You can use it to detect any errors that are found during the crawling and indexing process.

حل مشكلة تغطية الفهرس

To check the index coverage report, go to “Google Search Console” and click “Coverage” (just below the index). Once you open it, you’ll see a summary of four different states that categorize your URLs:

  • Error: These pages cannot be indexed and will not appear in search results due to some errors.
  • Valid with caveats: These pages may or may not be shown in Google search results.
  • Valid: These pages have been indexed and can be shown in search results. You don’t need to do anything.
  • Excluded: These pages have not been indexed and will not appear in search results. Google believes that you do not want to index them or consider the content unworthy of indexing.

You need to check and correct all the pages in the error section as soon as possible because you may lose the opportunity to direct traffic to your site. If you have time, review the pages included in the valid status with a warning as there may be some vital pages that should under no circumstances appear in search results. Finally, make sure that the excluded pages are the pages that you do not want to be indexed. There are several variants of the index coverage issue that we will present below.

Step 2: How to solve the problems in each case of index coverage

Once you open the Index Coverage report, select the desired status (Errors, Valid with Warnings, or Excluded) and see the details provided at the bottom of the page. You will find a list of error types according to their severity and the number of affected pages, so we recommend that you start investigating problems from the top of the table. Let’s see each error in different cases and how you can fix it.

error condition

حل مشكلة تغطية الفهرس

Server 5XX أخطاء Errors

These are the URLs that return the 5xx status code to Google and are an index coverage issue.

Actions to be taken:

  • Check the type of status code 500 that is returned. Here you have a complete list of each server error status code definition.
  • Reload the URL to see if the error persists. 5xx errors are temporary and require no action.
  • Verify that your server is not overloaded or misconfigured. In this case, ask your developers for help, or contact your hosting provider.
  • Perform log file analysis to check your server’s error logs. This practice provides you with additional information about the problem.
  • Review the changes you’ve made to your website recently to see if any of them are the root cause. eg) plugins, new backend code, etc.

Redirect errors

GoogleBot encountered an error during the redirect process that does not allow the page to be crawled. Most likely any of the following causes can cause this problem.

  • Redirect string was too long
  • redirection loop
  • Redirect URL exceeded the maximum URL length
  • There was a wrong or empty URL in the redirect string

Actions to be taken

Get rid of redirect strings and loops. Make each URL do only one redirect. In other words, a redirect from the first URL to the last URL.

Actions to be taken

Check if search engines want to index the page in question or not.

  • If you don’t want it indexed, upload an XML sitemap to remove the URL.
  • Conversely, if you want to index it, change the instructions in the Robots.txt file. Here is a guide on how to edit a robots.txt file.

The URL that was sent is marked with ‘noindex’

These pages were submitted to Google via an XML sitemap, but have a “noindex” directive in either the bots meta tag or HTTP headers.

Actions to be taken:

  1. If you want the URL to be indexed, you have to remove the noindex.
  2. If there are URLs that you don’t want Google to index, remove them from your XML sitemap
  3. URL sent to Google by XML sitemap error 401. This status code tells you that you are not authorized to access the URL. You may need a username and password, or there may be access restrictions based on IP address, one of the index coverage issues.

    Actions to be taken:

  4. Check if the URLs should return the 401 error. In this case, remove it from the XML sitemap.
  5. If you don’t want them to display a 401 code, remove the HTTP authentication if any.

Submitted URL not found (404)

I submitted the URL for indexing purposes to Google Search Console, but Google cannot crawl it due to a different issue than the one above.

Actions to be taken:

  • Check if you want to index the page or not. If yes, then fix it, so that it returns a status code of 200. You can also set a 301 redirect to a URL, so that it returns a proper page. Remember that if you choose redirect, you need to add the custom URL to the XML sitemap and remove the address that gives a 404.
  • If you don’t want the page to be indexed, remove it from the XML sitemap.

The URL that was submitted has a crawling problem

I submitted the URL for indexing purposes to GSC but Google cannot crawl it due to a different issue than the one above.

Actions to be taken:

  • Use the URL Check tool to get more information about the cause of the problem.
  • Sometimes these errors are temporary, so they do not require any action.

Valid with warning status

حل مشكلة تغطية الفهرس

These pages are indexed, even though they are blocked by robots.txt. Google always tries to follow the directions given in the robots.txt file. However, sometimes it behaves differently and this is an index coverage issue. This can happen, for example, when someone links to the given URL. You can find URLs in this category because Google doubts whether you want to block these pages in search results.

Actions to be taken:

  • Google does not recommend using robots.txt to avoid page indexing. Alternatively, if you don’t want to see these pages indexed, use noindex in the meta robots or HTTP response header.
  • Another good practice to prevent Google from accessing the page is to implement HTTP Authentication.
  • If you do not want to block the page, make the necessary corrections in the robots.txt file.
  • You can specify the rule that is blocking the page using the robots.txt test tool.

excluded case

حل مشكلة تغطية الفهرس

These pages are not indexed in search results, and Google thinks that’s the right thing. For example, this could be because they are duplicate pages for indexed pages or because you give instructions on your website for search engines to index them. Below we will show you 15 cases in which your page can be disqualified and these cases are examples of an index coverage problem.

1- excluded with “noindex”

You are telling search engines not to index the page by giving the “noindex” command.

Actions to be taken:

  • Check if you don’t really want the page to be indexed. If you want the page to be indexed, remove the “noindex” tag.
  • You can confirm the existence of this directive by opening the page and searching for “noindex” in the response body and response header.

Blocked by Page Removal Tool

2- You have submitted a request to remove the URL of these pages on GSC.

Actions to be taken:

  • Google only attends this request for 90 days, so if you don’t want the page to be indexed, use “noindex” directives, implement HTTP authentication, or remove the page.

3- Banned by robots.txt

It can still be indexed if Google can find information about that page without loading it. Perhaps a search engine

You are preventing Googlebot from accessing these pages using a robots.txt file. However Google is indexing the page before adding disallow in robots.txt

Actions to be taken:

  • If you don’t want the page to be indexed, use the “noindex” command and remove the robots.txt block.

4- Prohibited due to an unauthorized request (401)

Access to Google using permission request (401 response) is prohibited.

Actions to be taken:

  • If you want to allow GoogleBot to visit the page, remove the license requirements.

5- Crawl abnormal

The page was not indexed due to the 4xx or 5xx error response code.

Actions to be taken:

  • Use the URL Check tool to get more information about issues.

6- Crawled – Not currently indexed

This page was crawled by GoogleBot but not indexed. It may or may not be indexed in the future. There is no need to submit this URL for crawling.

Actions to be taken:

  • If you want the page to be indexed in search results, be sure to provide valuable information.

7- Discovered – Currently Not Indexed

Google found this page, but it hasn’t been able to crawl it yet. This situation usually occurs because when GoogleBot tried to crawl the page, the site was overloaded. The crawl is scheduled for another time. No action is required.

8- Alternate page with the appropriate canonical tag

This page points to a canonical page, so Google understands that you don’t want it indexed.

Actions to be taken:

  • If you want this page to be indexed, you will need to change the rel=canonical attributes to give Google the required instructions.

9- Duplicate copy without the main page chosen by the user

The page has exact copies, but none of them are marked as master pages. Google considers this not to be the statute.

Actions to be taken:

  • Use canonical tags to indicate to Google which pages are canonical (which should be indexed) and which are duplicate pages. You can use the URL Inspection tool to see which pages have been marked as canonical by Google.

10. Google chose a different canonical address from the user:

You’ve marked this page as a canonical page, but Google, instead, has indexed another page that it thinks works better as a canonical page.

Actions to be taken:

  • You can follow Google’s selection. In this case, mark the indexed page as canonical and this page as a duplicate of the canonical URL.
  • If not, find out why Google prefers another page over your chosen one, and do

necessary changes. Use the URL Inspection tool to discover the “canonical page” specified by Google.

One of the more curious “failures” we experienced with the Index Coverage Issue report was the discovery that Google wasn’t properly processing our primary addresses (and we’ve been doing it wrong for years). Google was indicating in Search Console that the specified canonical page was invalid when the page was completely formatted. In the end, it turned out to be a bug from Google itself.

11- Not Found (404)

The page displays a 404 error status code when Google makes a request. GoogleBot did not find the page through the sitemap, but possibly through another website that links to the URL. It is also possible that this URL existed in the past and was removed.

Actions to be taken:

  • If the response is a 404 intended, you can leave it as is. It will not harm your SEO performance. However, if the page has moved, do a 301 redirect.

12- The page was removed due to a legal complaint

This page has been removed from the index due to a legal complaint.

Actions to be taken:

  • Check the legal rules you may have violated and take action to correct them.

13- The page that contains the redirect

This URL is a redirect and therefore not indexed.

Actions to be taken:

  • If the URL is not supposed to redirect, remove the redirect implementation.

soft 404 -14

The page returns what Google thinks is a soft 404 response. The page is not indexed because even though it provides a 200 status code, Googles thinks it should return a 404.

Actions to be taken:

  • See if you should set a 404 for the page, as Google suggests.
  • Add valuable content to the page to let Google know it’s not a Soft 404.

15- Duplicate URL, submitted and not specified as canonical URL

You have submitted the URL to GSC for indexing purposes. However, it is not indexed because the page has duplicates without canonical tags, and Google considers a better candidate for the canonical page.

Actions to be taken:

  • Decide if you want to continue with Google’s selection of the canonical page. In this case, set the rel = canonical attributes to point to the page specified by Google.
  • You can use the URL Inspection tool to see which page Google has chosen as the canonical page.
  • If you want this URL as the canonical, analyze why Google prefers the other page. Offer more high-value content on the page of your choice.

Step 3. Report the most common index coverage issue

Now you know the different types of errors you can find in the Index Coverage report and the actions to take when you encounter each error. Below is a brief overview of the issues that arise frequently.

More than valid pages were excluded

Sometimes you can have more excluded pages than valid pages. This circumstance is usually served on large sites that have experienced a significant URL change. Maybe it’s an old site with a long history, or the web code has been modified.

If you have a large difference between the number of pages of the two cases (excluded and valid), then you have a serious problem. Start by reviewing the excluded pages, as we explained above in the Index Coverage Report

error mutations

When the number of errors increases dramatically, you need to check the error and fix it as soon as possible. Google has detected some issues that seriously damage your website’s performance. If you don’t fix the problem today, you’ll have big problems tomorrow.

server errors

Make sure these errors are not 503 (Service Unavailable). This status code means that the server cannot process the request due to temporary overload or maintenance. At first, the error should go away on its own, but if it keeps happening, you should look at the problem and fix it. It looks like Google has detected some areas of your website that are generating 404 – Pages Not Found. If the size increases significantly.

Missing pages or sites

If you can’t see a page or site in the report, it could be due to several reasons.

1- Google hasn’t discovered it yet. When the page or site is new, it may take some time before Google finds it. Submit a sitemap or page crawl request to speed up the indexing process. Also, make sure that the page is not an orphan and is linked from the site.
2- Google cannot access your page based on the login request. Remove the authorization requirement to allow GoogleBot to crawl the page.
3-The page has a noindex tag or has been omitted from the index for some reason. Remove the noindex tag and make sure you provide valuable content on the page.

Errors and exceptions

This problem occurs when there is a discrepancy. If you submit a page via a sitemap, you must ensure that it is indexable, and that it is linked to the site. Your site should mostly consist of valuable pages worth linking to.

Summary

Here is a three-step summary of the “Solving the Index Coverage Problem” article.

  • The first thing you want to do when using the Index Coverage report is to fix the pages that are in the wrong state. This should be done to avoid Google penalties.
  • Second, check the excluded pages and see if these are the pages you don’t want to be indexed. If not, follow our troubleshooting instructions.
  • If you have time, we highly recommend checking the valid pages with a warning. Make sure the instructions you provided in your robots.txt file are correct and that there are no inconsistencies.

We hope you find this article useful. Let us know if you have any questions regarding the Index Coverage Report.

All Copyright Reserved for Nofal S.E.O™ 2022