How to control Sitechecker's Web Crawler?

"Crawler" is a generic term for any program (such as a robot or spider) that is used to automatically discover and scan websites by following links from one webpage to another. Sitechecker's Web Crawler doesn't crawl all websites on the internet. It crawls only websites and pages that users requested to scan.

Parameters of Sitechecker's Web Crawler:

Tools in Sitechecker Platform where SiteCheckerBotCrawler works:

  • Site Audit
  • Site Monitoring
  • On-Page Checker/Page Details report

How SiteCheckerBotCrawler scans your website

SiteCheckerBotCrawler's crawling process starts from a user's request to crawl a specific domain or URL. 

In On-Page Checker and Page Details report SiteCheckerBotCrawler scans only a specific URL and its internal and external links.

In Site Audit SiteCheckerBotCrawler scans all URLs he finds on the website starting from the homepage. So, if your website has pages without even one internal link from other pages crawler won't detect this page (unless it is in the sitemap of the website or in your Google Search Console).

How to block SiteCheckerBotCrawler from scanning your website

There are a few ways how to block SiteCheckerBotCrawler:

1. Block using robots.txt file

Add this content to the robots.txt file of your website.

User-agent: SiteCheckerBotCrawler
Disallow: /

2. Block using .htaccess file

Add this content to the .htaccess file of your website. Don't forget to replace with your domain!

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} SiteCheckerBotCrawler [OR]
RewriteRule ^.*$ “http\:\/\/yordomain\.com” [R=301,L]

You also can block the bot by IP address. Check this guide to learn more about how to block bots via the .htaccess file.

3. Block using the firewall

If you are using a web application firewall (WAF) to manage your incoming traffic, block SiteCheckerBotCrawler by creating a specific rule on the side of WAF. This guide is a good example of how to block bots using the Cloudflare Firewall.

How to allow SiteCheckerBotCrawler to scan your website

To allow SiteCheckerBotCrawler to scan the website you might make sure that our bot isn't blocked using the methods described above.

1. Check the website's robots.txt file

Make sure that there is no disallow rule for SiteCheckerBotCrawler user agent. If such a rule exists change it to the below one.

User-agent: SiteCheckerBotCrawler
Allow: /

2. Check the website's .htaccees file

Make sure that SiteCheckerBotCrawler isn't blocked in the .htaccess file by user agent or IP address. If you found that the bot is blocked delete this rule. If you don't know how to work with the .htaccess file contact your web developer or hosting provider.

3. Check rules in a web application firewall (if you are using one)

Make sure, that there is no rule to block SiteCheckerBotCrawler requests to the website on the side of the web application firewall (WAF). In this case, the bot also can be blocked by user agent such as by an IP address. If you don't know how to work with the WAF, contact support of this service, so they can delete the rule of blocking SiteCheckerBotCrawler for you.

How to Allow SiteCheckerBotCrawler/1.0 on CloudFlare

1. Log in to your CloudFlare account

2. Select the account associated with the website

3. Then, select WAF.

4.Go to the Firewall rules tab.

5. Create a new Access Rule

6. Configure the action of the rule as ‘Allow’.

7. Select “User Agent” as match criteria and enter our user agent string “SiteCheckerBotCrawler/1.0 (+” into Match Value field

8. Set Priority value to be 3 (Medium) or above to make sure that this rule gets applied correctly relative to other rules you may have already setup adding restrictions against robots or crawlers

9. Click “Save” at the bottom of page when done

10. Then proceed to crawl your website with Sitechecker — this should confirm if your adjustments were successful 10 If everything worked well and the pass-through was successful, you will now be able to receive SiteCheckerBotCrawler/1.0 (+ crawling your website!

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us