How to control Sitechecker's Web Crawler?

"Crawler" is a generic term for any program (such as a robot or spider) that is used to automatically discover and scan websites by following links from one webpage to another. Sitechecker's Web Crawler doesn't crawl all websites on the internet. It crawls only websites and pages that users requested to scan.

Parameters of Sitechecker's Web Crawler:

  • User-Agent: SiteCheckerBotCrawler/1.0 (+

Tools in Sitechecker Platform where SiteCheckerBotCrawler works:

  • Site Audit
  • Site Monitoring
  • On-Page Checker

How SiteCheckerBotCrawler scans your website

SiteCheckerBotCrawler's crawling process starts from a user request to crawl a specific domain or URL. 

In On-Page Checker and Page Audit SiteCheckerBotCrawler scan only a specific URL and its internal and external links.

In Site Audit SiteCheckerBotCrawler scans all URLs he finds on the website starting from the homepage. So, if your website has pages without even one internal link from other pages crawler won't detect this page.

How to block SiteCheckerBotCrawler from scanning your website

There are a few ways how to block SiteCheckerBotCrawler:

1. Block using robots.txt file

Add this content to the robots.txt file of your website.

User-agent: SiteCheckerBotCrawler
Disallow: /

2. Block using .htaccess file

Add this content to the .htaccess file of your website. Don't forget to replace with your domain!

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} SiteCheckerBotCrawler [OR]
RewriteRule ^.*$ “http\:\/\/yordomain\.com” [R=301,L]

You also can block the bot by IP address. Check this guide to learn more about how to block bots via the .htaccess file.

3. Block using the firewall

If you are using a web application firewall (WAF) to manage your incoming traffic, block SiteCheckerBotCrawler by creating a specific rule on the side of WAF. This guide is a good example of how to block bots using the Cloudflare Firewall.

How to allow SiteCheckerBotCrawler to scan your website

To allow SiteCheckerBotCrawler to scan the website you might make sure that our bot isn't blocked using the methods described above.

1. Check the website's robots.txt file

Make sure that there is no disallow rule for SiteCheckerBotCrawler user agent. If such a rule exists change it to the below one.

User-agent: SiteCheckerBotCrawler
Allow: /

2. Check the website's .htaccees file

Make sure that SiteCheckerBotCrawler isn't blocked in the .htaccess file by user agent or IP address. If you found that the bot is blocked delete this rule. If you don't know how to work with the .htaccess file contact your web developer or hosting provider.

3. Check rules in a web application firewall (if you are using one)

Make sure, that there is no rule to block SiteCheckerBotCrawler requests to the website on the side of the web application firewall (WAF). In this case, the bot also can be blocked by user agent such as by an IP address. If you don't know how to work with the WAF, contact support of this service, so they can delete the rule of blocking SiteCheckerBotCrawler for you.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.