How to control Sitechecker's Web Crawler?
"Crawler" is a generic term for any program (such as a robot or spider) that is used to automatically discover and scan websites by following links from one webpage to another. Sitechecker's Web Crawler doesn't crawl all websites on the internet. It crawls only websites and pages that users requested to scan.
Parameters of Sitechecker's Web Crawler
- User-Agent: SiteCheckerBotCrawler/1.0 (+http://sitechecker.pro)
- IP address: 126.96.36.199
- Obeys robots.txt: Yes
- Obeys сrawl-delay: No
Tools in Sitechecker Platform where SiteCheckerBotCrawler works:
- Site Audit
- Site Monitoring
- On-Page Checker
How SiteCheckerBotCrawler scans your website
SiteCheckerBotCrawler's crawling process starts from a user request to crawl a specific domain or URL.
In On-Page Checker and Page Audit SiteCheckerBotCrawler scan only a specific URL and its internal and external links.
In Site Audit SiteCheckerBotCrawler scans all URLs he finds on the website starting from the homepage. So, if your website has pages without even one internal link from other pages crawler won't detect this page.
How to block SiteCheckerBotCrawler from scanning your website
There are a few ways how to block SiteCheckerBotCrawler:
1. Block using robots.txt file
Add this content to the robots.txt file of your website.
2. Block using .htaccess file
Add this content to the .htaccess file of your website. Don't forget to replace yourdomain.com with your domain!
You also can block the bot by IP address. Check this guide to learn more about how to block bots via the .htaccess file.
3. Block using the firewall
If you are using a web application firewall (WAF) to manage your incoming traffic, block SiteCheckerBotCrawler by creating a specific rule on the side of WAF. This guide is a good example of how to block bots using the Cloudflare Firewall.
How to allow SiteCheckerBotCrawler to scan your website
To allow SiteCheckerBotCrawler to scan the website you might make sure that our bot isn't blocked using the methods described above.
1. Check the website's robots.txt file
Make sure that there is no disallow rule for SiteCheckerBotCrawler user agent. If such a rule exists change it to the below one.
2. Check the website's .htaccees file
Make sure that SiteCheckerBotCrawler isn't blocked in the .htaccess file by user agent or IP address. If you found that the bot is blocked delete this rule. If you don't know how to work with the .htaccess file contact your web developer or hosting provider.
3. Check rules in a web application firewall (if you are using one)
Make sure, that there is no rule to block SiteCheckerBotCrawler requests to the website on the side of the web application firewall (WAF). In this case, the bot also can be blocked by user agent such as by an IP address. If you don't know how to work with the WAF, contact support of this service, so they can delete the rule of blocking SiteCheckerBotCrawler for you.