October 31, 2018

Crawl Errors in Google Search Console: The Ultimate Guide

2018-10-31T00:33:00+05:30

Few days after I cleaned the wordpress website from a virus attack, I noticed many 404 errors in Google Search Console. Google bot was crawling non-existent page (a 404 page).

Are you searching for fixing google 404 error page, how to fix 404 error on google and how to fix soft 404 errors. This relevant review will solve question on how to remove google 404 error, how to fix url problem and how to fix crawl errors 404 not found.

This post will show you amazing tools of crawl errors checker, google webmaster tools broken link checker, error 404 not found tracker plugins for word press, IP address checker, Robot.txt tester and Fetch as Google.

These best practices surely improve the Error 404 Not Found in Google Search Console and error 404 not found pages of your website.

Table of Contents

You can use CTRL + F to find the heading and read the content.
  1. What is a Error 404 Not Found Crawl Errors
  2. Increased Crawl Errors 404 Not Found Requests Means
  3. Google Webmaster Tools Broken Link Checker
  4. How to Access Crawl Errors in Google Search console
  5. Types of Crawl Errors in Google Search Console
  6. Crawl Errors Checker to Track 404 Not Found URL
  7. Understand the 404 Non Existing Url Structure to Identify Spam
  8. Check IP Address to find and fix URL errors on a Google search console
  9. How to Remove Google 404 error using Robot.txt file
  10. Fetch as Google tool - Crawler Simulator
  11. Permalink Settings to create a custom URL structure
  12. Use a proper Robot.txt file
  13. Crawl errors checker - Robot.txt Tester in Google Search Console
  14. Block Unknown IP addresses to fix google crawl url 404 Not Found
  15. How to Identify Bad IP address and Block them
  16. How to Block IP address in .htaaccess To Remove Spam Crawl errors
  17. How to fix crawl errors in google webmaster tools Using Sitemap
  18. Remove URLs from the index with the Google Search Console
  19. How to deal with 404 errors from Facebook

What is a 404 Error Not Found

This increase in Crawl Errors “404” pages in Google Search Console are an issue many websites has. The increase in “404” pages will result less ranking and negative SEO. Why is this increase in “404” pages and Crawl Errors 404 Not Found in Google Search Console and How to Fix them.

A Error 404 Not Found appears when a page from the website is moved or deleted. Google request the page to the server of the website. The server searches the local database and unable to find the page Google is asking. Server returns the request with a 404 error. This means server was unable to find the page.

How to Fix 404 Error on Google Search Console


Increased 404 errors Not Found means many pages in the website are moved or deleted. Google list all the pages of the website and store in the memory. Google will store the pages for long time even if you delete the page from your website. Google think that the page is still there in the website and keep asking server for page.

Increased Crawl Errors 404 Not Found Requests Means

Increased Error 404 Not Found request means that Google is having trouble in showing the listed pages in your website. This happens in few cases.

1. Large number of pages are removed or deleted.
2. A spam attack resulting in increased crawl errors.
3. Low quality spam back links.
4. Installed plugin configuration issues.
5. Malware and hacked content inside the website folders -  Here is a good read on how to Fix Hacked Content Found in Google Search Console.

You will have to identify why these 404 errors are appearing and what is the cause behind it. 404 error is a client side error which means that the user who is visiting your website will get this error instead of the actual web page.

Increased Crawl Errors 404 Not Found Requests Means

→ Crawl errors affect ranking

Users will have a bad user experience because of this. User will close the web page immediately or click back to go back to Google. This will result i increased bounce rate and ultimately loss of search ranking.

Google mentioned long back that 404 (not found) error will not hurt your website search ranking. It is not true at all. Google reduce ranking of website which gives bad user experience. Returning a 404 Not Found result code is not fine and you will have to reduce them.

It is easier to find such types of crawl errors in the new search console. I still prefer old search console for the ease of use.

Google Webmaster Tools Broken Link Checker

There are many software applications like Screaming Frog, Xenu to completely crawl website for errors. You can choose one and find out all crawl errors and match it with the errors in Google Search Console.

Google Webmaster Tools Broken Link Checker


You can use broken link checker online to find all broken links. I use Broken Link Check to see the errors. This will show you only 404 errors related to posts and pages. It is a useful tool for starters.

How to Access Crawl Errors in Google Search console

Crawl errors can be accessed in Google Search console under Crawl section. No need to panic when you see many such Crawl Errors 404 Not Found in Google Search Console. Appreciate yourself for starting this venture to fix each of them and creating a healthy website.

You will see many 404, 500, 503, “Soft 404”, 400 errors in this section. Majorly this section is divided to two segments - Site Errors and URL errors. All the crawl errors are available in these two sections.

Site Errors in Google Search Console

You can see Site Errors section on top Showing data from the last 90 days. Site errors include DNS Server, connectivity and Robots.txt fetch. Generally these three sub sections will not show any errors.

URL Errors 404 Not Found in Google Search Console

URL errors are divided into two segments  - Desktop and Smartphone. 
  • Server error, Access denied, Not found, and Other errors are visible under under Desktop. 
  • Server error, Access denied, Not found, Blocked and Other errors are visible under under Smartphone. 
How to Access Crawl Errors in Google Search console

This is where the work starts.

Types of Crawl Errors in Google Search Console

This section will help you to fix All Crawl Errors and 404 Not Found URLs. Let us dive deep to each type of Crawl Errors in Google Search Console.

Server Error HTTP 5XX Google Search Console

Caused when a server returns the request of the Google bot. Google bot is small software application that crawl your website. Google will not be able to index the web page if the server return the requests. Server error 5xx in google search console is showed if the server return a 500 status code.

“Soft 404” errors

Google bot think that some pages in the website has thin content or no content. The pages do exist and server return the request with 200 status code. These pages are still shown in “Soft 404” errors. Pages with few words, pages with ads only are considered as Soft 404 errors.

Access Denied

Some servers automatically deny access to bots to not allow directory listings. This is why Access Denied 403 errors are shown. As per the new standard, Google want to crawl all files and folders of the website. This can also happen with incorrect domain and DNS configurations.

Types of Crawl Errors in Google Search Console

Not Found Errors

This section contains all 404 Not Found errors related to posts, pages and attachments. These errors are common and can increase drastically with website migration or changing http to https. 

Blocked Errors

Web masters use robot.txt file to block access to certain pages. Some pages are denied of access to Google bots.

Other Errors

This section includes 400, 405, 406 errors. Google bot is unable to find the page but unable to find the issue. This error will be shown with "Other undetermined reason" error.

URL Parameters and Crawl Stats

There are sections called URL Parameters and Crawl Stats under Crawl. No need to edit these.

Crawl Errors Checker to Track 404 Not Found URL

To fix these crawl errors in Google Search Console, identify the structure, period of occurring and IP address of the 404 URL. Word Fence plugin and 404 to 301 plugin are good Crawl Errors Checker which allow you to Track 404 Not Found URL.

For that you need to install Word Fence plugin and 404 to 301 plugin in wordpress website. Word Fence plugin will show you live traffic and show you all server errors. 404 to 301 plugin will list all 404 errors.

The 404 urls are appearing in Word Fence under the Tools > Live Traffic column > Google Crawlers in Word Fence plugin. These errors are reported by the GoogleBot, crawlers, and other bots.

Crawl Errors Checker to Track 404 Not Found URL

You will see many tables here including Type, Location, Page, Visited, Time, IP Address, Hostname and Response View. Watch the Response View and compare the URL errors appearing here with the URL errors you saw in Google Search Console. If that matches, click on the 404 error to expand the view the activity detail.


Here is an example of the activity detail of Google Bot trying to access a non existent 404 page. Google Search Console will show these Crawl Errors for pages that don't exist and this result from a targeted spam attack.

Activity Detail
 United States Aliso Viejo, United States was redirected when visiting https://healthcostaid.com/eaQYwcb10213804804b-h/o_dDgRfUr099159f02.bar
10/24/2018 2:40:30 PM (5 minutes ago)
IP: 66.249.69.84 Hostname: crawl-66-249-69-84.googlebot.com
Browser: Chrome version 0.0 running on Android
Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

First you will have to check and see the structure of this URL to see if they are spam.

Wordfence Options To Block Spam IP Address


Understand the 404 Non Existing Url Structure to Identify Spam

I installed another plugin called 404 to 301 which allow me to access all 404 Error Logs. I checked how many non existing urls are appearing. This method was features in may websites.



The below steps have to be done carefully. As you can see below, the 404 Url had two paths.

http://healthcostaid.com - this is your website.

/eaQYwcb10213804804b-h - Second path

/o_dDgRfUr099159f02 - Third path

.bar - File extension

These are spam URLs. Someone is attacking your website.

Here are a few other spam URLs that Google is showing in Google Search Console as server errors.

http://healthcostaid.com/0ohPdcGCZSSbb10196r-i/g_mba682r-i/g_mlUho-dRGVqtjM4e335
http://healthcostaid.com/42qCcUQctTb102138d9215b-h/o_dXBGorlWtsg9dbfdd96c.bar

If they are spam, the next step to identify the source of the IP address and see if they are actually Google Bots.

Check IP Address to find and fix URL errors on a Google search console

Here are two spam attacks to the website from an ip address. You will have to find the IP address of these request to see where the traffic is coming from. By Checking IP address, you can fix URL errors on a Google search console.

http://healthcostaid.com/7VlVORp-Vb10196r-i/g_m71c7er-i/g_mmiJgjJpTote6b68 from IP: 66.249.69.126
http://healthcostaid.com/1FiPKMF_-xKob10196r-i/g_m0d1ffr-i/g_mHSjWrFf-XsXwl814a1 from IP: 66.249.69.116

Use IPINFO checker to check the source of the IP. I checked the source of this IP and this is the result.

Check IP Address to find and fix URL errors on a Google search console

The ip is of Google bot. You should not block this IP as it belongs to Google. This means this is a server error. You will have to block the path of this 404 URL in robot.txt file to stop the crawling of these errors.

How to Remove Google 404 error using Robot.txt file

You can remove Google 404 error using robot.txt file permanently. I installed Yoast plugin. Went to Tools > File editor.

How to Remove Google 404 error using Robot.txt file

Here you can create and edit robot.txt file. Create a robot.txt file if you do not have.

You have to be very careful in editing this.

Create block paths for blocking such spam URLs.

1. http://healthcostaid.com - this is your website.

/eaQYwcb10213804804b-h - Second path

/o_dDgRfUr099159f02 - Third path

.bar - File extension

So I created a path like this-  disallow: /*/*.bar

2. http://healthcostaid.com

/7VlVORp-Vb10196r-i

/g_m71c7er-i

/g_mmiJgjJpTote6b68

Here I made the path life this-  disallow: /*/*?*

You can make path like this and added in the robot.txt file. I clicked Save changes to robot.txt to save the configuration.

Block 404 Non Existing Urls Spam using Robot.txt

This blocked bots to crawl both 404 Non Existing Urls. I rechecked the URLs in fetch as Google and it showed blocked. This means the issue of remove google 404 error is fixed.

You have double careful in editing this robot.txt file. Google may ignore search results and this may lose your rankings.

Fetch as Google tool - Crawler Simulator

Fetch as Google tool enables the user to fetch and render URLs. This crawler simulator will help you to see the live status of URLs. Go to Search Console > Crawl > Fetch as Google. Add the URL in the empty box and click Fetch.

Fetch as Google tool - Crawler Simulator

This will immediately trigger Google bot to scan this URL and show you whether it is rendering properly or not. The status will show "Complete" with a green tick mark if Google bot was able to crawl the web page successfully. If you blocked any path or files in Robot.TXT folder, the status will be "Blocked".

Permalink Settings to create a custom URL structure

I made sure that none of my post pages are appearing in this type of path. I went to Permalink Settings to create a custom URL structure.

Permalink Settings to create a custom URL structure


I added "/%postname%" as the custom structure. This way all my posts will appear with a custom structure like this: https://healthcostaid.com/consequences-for-not-paying-medical-bills. The path of this would be :/*/. I will not block this path in robot.txt. If I add this path, Google bot will not be able to access my site.

Use a proper Robot.txt file

A proper robot.txt file block the spam and junk URLs and allow other files to crawl. This will immediately reduce Crawl Errors 404 Not Found in Google Search Console. Below is a customized robot.txt file for wordpress website. Copy and it and paste in robot.txt file.

User-agent:  *

User-agent: Mediapartners-Google
Allow: /

User-agent: Googlebot-Image
Allow: /

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Video
Allow: /

User-agent: Googlebot-News
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Sitemap: https://www.shipmethis.com/sitemap.xml. Add your sitemap address here.

Google Search Console automatically add sitemap to your site. This will appear like below.

User-agent: Mediapartners-Google
Disallow:

User-agent: *
Disallow: /search
Allow: /

Sitemap: https://www.shipmethis.com/sitemap.xml


This will allow all Google bots to access posts and pages.

Crawl errors checker - Robot.txt Tester in Google Search Console

Robot.txt tester is an amazing tool in the Google Search Console. This tool is a crawl errors checker and allow you find whether Google bots are allowed to crawl the posts and pages. This will check for crawl errors and path issue.

Go to Crawl > robots.txt Tester to test if the spam URLs are blocked.

Crawl errors checker - Robot.txt Tester in google search console

Check if your post URL are allowed.

robot txt tester in crawl errors

Check if Google Bots are allowed.

google bot allow crawl URL

This means you have successfully updated robot.txt file and fixed most of the Crawl Errors 404 Not Found in Google Search Console.

You have to keep track on lose of search rankings and how your results show up in Google. Be Careful while you do this. The safest way is to block spam IP addresses.

Block Unknown IP addresses to fix google crawl url 404 Not Found

There are many spammers who attack your website without you knowing about it. You will have to find these unknown IP addresses and block them to keep out spammers.

Spammers often put encrypted php scripts that generate huge number of links inside your website and that creates bad user experience. Spammers also ping in high numbers [Brute Force Attack] to find vulnerabilities of your website and attack it.

Block Unknown IP addresses to fix google crawl url 404 Not Found


Many future crawl errors can be fixed by blocking such bad IP addresses. You can read how to Block a specific IP address, a specific domain, multiple IP addresses and entire subnet in this article. I always insist on blocking the entire sub net of IP addresses to be on the safe side.

Do not block IP address of Google, MSN, Bing, Microsoft, Facebook etc. Here is the list of IP address and BOT IDs that you should not block. Net Range means the ip addresses can be from that range. Host name is the name of the bot. The IP address is shown in bold letters.

Google Bot - NetRange: 66.249.64.0 - 66.249.95.255 - Hostname: crawl-66-249-69-84.googlebot.com

Facebook Bot - NetRange: 173.252.64.0 - 173.252.127.255 - Hostname: 173.252.87.7  facebookexternalhit/1.1

Amazon - NetRange: 54.208.0.0 - 54.221.255.255 - Hostname: ec2-54-210-188-114.compute-1.amazonaws.com

MSN - NetRange: 40.74.0.0 - 40.125.127.255 - Hostname: msnbot-40-77-167-18.search.msn.com

How to Identify Bad IP address and Block them

Go to Word Fence or 404 to 301 plugin and check IP addresses of 404 errors. Copy each IP and check it in IP INFO Checker. If these are from China, Russia, and are from some broad band servers, block them immediately. 

How to Identify Bad IP address and Block them

I checked this IP: 123.168.150.36 address in ip info trace. 

check IP addresses of 404 errors


As per WHOIS record, this is an unknown organisation. This means this is spam IP address. You should immediately block this sub net.

How to Block IP address in .htaaccess To Remove Spam Crawl errors

In order to block this IP address, Install Yoast Plugin > File Editor and scroll down to access .hta file.

How to Block IP address in .htaaccess To Remove Spam Crawl errors

In order to block a sub net of IP, add the first two digits of IP after Deny from.

If you want to block IP: 123.168.150.36, use the 123.168 and make command like this;

Deny from 123.168

Click save changes to .htaaccess to save the changes. This will block the entire IP addresses from accessing your site. 

You can directly add the following spam IP address to your .htaaccess to stop them from accessing your site.

Deny from 217.23
Deny from 185.234
Deny from 31.211
Deny from 59.59
Deny from 123.168
Deny from 115.235
Deny from 59.59
Deny from 39.104
Deny from 193.169
Deny from 47.92
Deny from 114.99
Deny from 36.26
Deny from 62.210
Deny from 183.158

These IP addresses are spam and creates Crawl Errors,  404 Not Found errors in Google Search Console and attack your website.

How to fix crawl errors in google webmaster tools Using Sitemap

It is possible that the URLs, which Google is attempting to scan, are cached or not properly updated. In case requesting Google to re-crawl your website and its URLs does not resolve this it would be best to consult with an experienced website developer as this is related to the configuration of your website.

I would advice you to generate a new sitemap for your site (you may use an online tool as this one) and to ask Google to re crawl your website URLs. You may do so as per the following article on support Google Page.

Yoast plugin automatically creates a new sitemap index. This you can access from
Sitemap: https://healthcostaid.com/sitemap_index.xml. Change healthcostaid.com to your website address. Submit this to Google under Crawl > Sitemaps > Test > Resubmit.

How to fix crawl errors in google webmaster tools Using Sitemap
Google crawling will be allowed by this to the existing URLs and all other URL that were blocking the Googlebot's access will be removed. This will generally fix most of the crawl errors including "submitted url marked noindex" errors in google webmaster tools.

Remove URLs from the index with the Google Search Console

Remove URLs is section Under Google Index. You can use this section to temporarily hide URLs from search results. This will enable to hide certain URLs like spam URLs and bad back links till you find the source page and remove them.

Remove URLs from the index with the Google Search Console

Select Temporarily hide and enter the URL. Select Continue. Select "Temporarily hide pages from search results and remove from cache" and click Submit Request. All www/non-www and http/https variations of that URL will be hidden from search results.

How to deal with 404 errors from Facebook

Facebook bots gives 404 errors and that you can see in the Word Fence plugin. These 404 appears when an attachment or posts that you shared in Facebook is not moved or deleted. Facebook bot search for this in the website and unable to find the posts or attachments.

How to deal with 404 errors from Facebook


To fix these 404 errors, go to Word Fence live traffic. Check for the 404 errors crawled by facebookexternalhit/1.1. Then go to your Facebook page > Publishing Tools > Published posts. Check for the similar posts or attachment URL and delete them. You can handle all 404 errors from Facebook bots like this.

Do Share this post with other individual bloggers, website owners, social pages, groups and to those who are struggling to fix 404 crawl errors in Google Search Console. This post would be really helpful for them.

Kindly spend 5 seconds to share this post



Search Here

Whats Hot

About Author