How to exclude and include URLs in Site Audit?

When do you need to use it?

We introduce the "Include and Exclude URLs" feature in the Site Audit settings, streamlining the customization of website crawling without the need to modify robots.txt rules. This simplifies specifying which pages or domains to include or exclude from the crawling process, directly through a user-friendly interface.


Setting up rules for crawling

Accessing the feature

To access the "Include and Exclude URLs" feature, navigate to the Site Audit section of your project settings and look for the "Include and Exclude URLs" option.

You can create rules in two categories:

  • Include rules: Define which pages should be included in the crawl.
  • Exclude rules: Specify which pages should be explicitly excluded from the crawl.



How rules interact

  • Independent rules: When multiple rules are set, either in the "Include" or "Exclude" categories, they operate independently. This means the system will apply each rule separately to determine which URLs to crawl or exclude.
  • Priority of exclusion: If URLs fall under both "Include" and "Exclude" rules, the exclusion rules take precedence to ensure precise control over the crawling scope.

Types of rules available

Following recent updates and discussions, we have simplified the rules to include:


Changes to crawling behavior

  • Inclusion as a filter: Specifying "Include" rules acts as an additional filter to the domain scope, meaning only the URLs matching your defined rules are crawled.
  • Exclusion for specificity: By setting "Exclude" rules, you signal the crawler to omit those URLs, enhancing the focus of your site audit.
  • Respecting robots.txt: The feature respects the "Respect robots.txt rules" setting, meaning any "Include" rules conflicting with robots.txt will be ignored if the setting is enabled, and considered if disabled. In another words, Robots.txt rules have the highest priority. 

Migration of old projects


Some existing projects, particularly those with custom robots.txt settings and the "Respect robots.txt rules" checkbox disabled, will be migrated to utilize the new rules system for enhanced accuracy and performance.

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us