Having problems indexing in Google? This issue may result in lower traffic and conversion rates. It is necessary to check the indexed and non-indexed pages of your site to quickly solve any problem. Here we explain how to do it step by step using Google Search Console Index Coverage Report. Using the following method, we were able to fix the index coverage issue on hundreds of sites with millions or billions of excluded pages. Use it so none of your relevant pages lose visibility in search results and boost your SEO traffic.
The Search Console Coverage report tells you which pages have been crawled and indexed by Google, and why the URLs are in that specific state. You can use it to detect any errors that are found during the crawling and indexing process.
To check the index coverage report, go to “Google Search Console” and click “Coverage” (just below the index). Once you open it, you’ll see a summary of four different states that categorize your URLs:
You need to check and correct all the pages in the error section as soon as possible because you may lose the opportunity to direct traffic to your site. If you have time, review the pages included in the valid status with a warning as there may be some vital pages that should under no circumstances appear in search results. Finally, make sure that the excluded pages are the pages that you do not want to be indexed. There are several variants of the index coverage issue that we will present below.
Once you open the Index Coverage report, select the desired status (Errors, Valid with Warnings, or Excluded) and see the details provided at the bottom of the page. You will find a list of error types according to their severity and the number of affected pages, so we recommend that you start investigating problems from the top of the table. Let’s see each error in different cases and how you can fix it.
Server 5XX أخطاء Errors
These are the URLs that return the 5xx status code to Google and are an index coverage issue.
Actions to be taken:
GoogleBot encountered an error during the redirect process that does not allow the page to be crawled. Most likely any of the following causes can cause this problem.
Actions to be taken
Get rid of redirect strings and loops. Make each URL do only one redirect. In other words, a redirect from the first URL to the last URL.
Actions to be taken
Check if search engines want to index the page in question or not.
The URL that was sent is marked with ‘noindex’
These pages were submitted to Google via an XML sitemap, but have a “noindex” directive in either the bots meta tag or HTTP headers.
Actions to be taken:
URL sent to Google by XML sitemap error 401. This status code tells you that you are not authorized to access the URL. You may need a username and password, or there may be access restrictions based on IP address, one of the index coverage issues.
Actions to be taken:
I submitted the URL for indexing purposes to Google Search Console, but Google cannot crawl it due to a different issue than the one above.
Actions to be taken:
I submitted the URL for indexing purposes to GSC but Google cannot crawl it due to a different issue than the one above.
Actions to be taken:
These pages are indexed, even though they are blocked by robots.txt. Google always tries to follow the directions given in the robots.txt file. However, sometimes it behaves differently and this is an index coverage issue. This can happen, for example, when someone links to the given URL. You can find URLs in this category because Google doubts whether you want to block these pages in search results.
Actions to be taken:
These pages are not indexed in search results, and Google thinks that’s the right thing. For example, this could be because they are duplicate pages for indexed pages or because you give instructions on your website for search engines to index them. Below we will show you 15 cases in which your page can be disqualified and these cases are examples of an index coverage problem.
You are telling search engines not to index the page by giving the “noindex” command.
Actions to be taken:
Blocked by Page Removal Tool
Actions to be taken:
It can still be indexed if Google can find information about that page without loading it. Perhaps a search engine
You are preventing Googlebot from accessing these pages using a robots.txt file. However Google is indexing the page before adding disallow in robots.txt
Actions to be taken:
Access to Google using permission request (401 response) is prohibited.
Actions to be taken:
The page was not indexed due to the 4xx or 5xx error response code.
Actions to be taken:
This page was crawled by GoogleBot but not indexed. It may or may not be indexed in the future. There is no need to submit this URL for crawling.
Actions to be taken:
Google found this page, but it hasn’t been able to crawl it yet. This situation usually occurs because when GoogleBot tried to crawl the page, the site was overloaded. The crawl is scheduled for another time. No action is required.
This page points to a canonical page, so Google understands that you don’t want it indexed.
Actions to be taken:
The page has exact copies, but none of them are marked as master pages. Google considers this not to be the statute.
Actions to be taken:
You’ve marked this page as a canonical page, but Google, instead, has indexed another page that it thinks works better as a canonical page.
Actions to be taken:
necessary changes. Use the URL Inspection tool to discover the “canonical page” specified by Google.
One of the more curious “failures” we experienced with the Index Coverage Issue report was the discovery that Google wasn’t properly processing our primary addresses (and we’ve been doing it wrong for years). Google was indicating in Search Console that the specified canonical page was invalid when the page was completely formatted. In the end, it turned out to be a bug from Google itself.
The page displays a 404 error status code when Google makes a request. GoogleBot did not find the page through the sitemap, but possibly through another website that links to the URL. It is also possible that this URL existed in the past and was removed.
Actions to be taken:
This page has been removed from the index due to a legal complaint.
Actions to be taken:
This URL is a redirect and therefore not indexed.
Actions to be taken:
The page returns what Google thinks is a soft 404 response. The page is not indexed because even though it provides a 200 status code, Googles thinks it should return a 404.
Actions to be taken:
You have submitted the URL to GSC for indexing purposes. However, it is not indexed because the page has duplicates without canonical tags, and Google considers a better candidate for the canonical page.
Actions to be taken:
Now you know the different types of errors you can find in the Index Coverage report and the actions to take when you encounter each error. Below is a brief overview of the issues that arise frequently.
Sometimes you can have more excluded pages than valid pages. This circumstance is usually served on large sites that have experienced a significant URL change. Maybe it’s an old site with a long history, or the web code has been modified.
If you have a large difference between the number of pages of the two cases (excluded and valid), then you have a serious problem. Start by reviewing the excluded pages, as we explained above in the Index Coverage Report
When the number of errors increases dramatically, you need to check the error and fix it as soon as possible. Google has detected some issues that seriously damage your website’s performance. If you don’t fix the problem today, you’ll have big problems tomorrow.
Make sure these errors are not 503 (Service Unavailable). This status code means that the server cannot process the request due to temporary overload or maintenance. At first, the error should go away on its own, but if it keeps happening, you should look at the problem and fix it. It looks like Google has detected some areas of your website that are generating 404 – Pages Not Found. If the size increases significantly.
If you can’t see a page or site in the report, it could be due to several reasons.
1- Google hasn’t discovered it yet. When the page or site is new, it may take some time before Google finds it. Submit a sitemap or page crawl request to speed up the indexing process. Also, make sure that the page is not an orphan and is linked from the site.
2- Google cannot access your page based on the login request. Remove the authorization requirement to allow GoogleBot to crawl the page.
3-The page has a noindex tag or has been omitted from the index for some reason. Remove the noindex tag and make sure you provide valuable content on the page.
This problem occurs when there is a discrepancy. If you submit a page via a sitemap, you must ensure that it is indexable, and that it is linked to the site. Your site should mostly consist of valuable pages worth linking to.
Here is a three-step summary of the “Solving the Index Coverage Problem” article.
We hope you find this article useful. Let us know if you have any questions regarding the Index Coverage Report.
All Copyright Reserved for Nofal S.E.O™ 2022