Groups & user-agents
A robots.txt file is a set of groups. Each group starts with one or more
User-agent lines, followed by Allow and Disallow rules.
A crawler obeys the single group whose user-agent is the most specific match for its name.
A bot with its own group ignores the global * group entirely, the two are never merged.
Most specific rule wins
Within the matching group, the rule with the longest path wins, not the first one listed.
Order does not matter. When an Allow and a Disallow are the same length,
the Allow takes priority because Google uses the least restrictive rule.
Wildcards: * and $
* matches any run of characters and $ anchors the end of the URL.
So Disallow: /*.pdf$ blocks /file.pdf but not /file.pdf?x=1,
because the query string means the URL no longer ends in .pdf.
Rules are matched against the path plus query string, and paths are case-sensitive.
Common mistakes
An empty Disallow: allows everything, while Disallow: / blocks the whole site.
Blocking a URL in robots.txt does not remove it from Google, it only stops crawling, so use
noindex for de-indexing. AdsBot and AdSense crawlers do not obey the * group.
This tool applies Google's documented robots.txt rules. Other search engines and AI crawlers
mostly follow the same logic, but some treat wildcards and crawl-delay differently, so always
confirm critical rules against each crawler's own documentation.