What it is
Traditional robots.txt rules were written for search crawlers (Googlebot, Bingbot, etc.). AI agents introduced a new class of user-agents whose behaviour (training, grounding, live assistance) is qualitatively different from search. This check asks: has the site owner made any explicit choice about AI bots? The answer being "yes, I blocked all of them" is fine; the answer being "no, I never thought about it" is not.
Why it matters
- Default
User-agent: *rules often allow everything unintentionally. - AI vendors publish their bot names precisely so owners can opt in or out.
- Sites without AI stanzas are effectively consenting to whatever vendors decide to do.
Remediation Prompt
I want to improve my site's agent readiness. Please implement the following fix for AI Bot Rules in robots.txt across our codebase. Instructions: Please fix the AI Bot Rules in robots.txt issue on my site so it is agent-ready.
How we test it
Re-use the robots.txt body fetched by check 01. Parse user-agent groups.
Pass Warn Fail Matrix
| Condition | Status | Score |
|---|---|---|
≥1 group targets a recognised AI bot UA with a meaningful Allow: or Disallow: |
pass | 1.0 |
Site blanket-blocks all bots (User-agent: * / Disallow: /) — counts as explicit |
pass | 1.0 |
Only generic * rules, no AI bot-specific stanza |
fail | 0.0 |
| No robots.txt at all | fail | 0.0 |
Sub Tests
| id | Weight | Pass when |
|---|---|---|
has-ai-bot-stanza |
1.0 | ≥1 stanza for a known AI bot (list in check 01) |
Recognised Ai Bots Curated Keep In Code
aideBot
Amazonbot
anthropic-ai
Applebot-Extended
Bytespider
CCBot
ChatGPT-User
ChatGPT
ClaudeBot
Claude-Web
Claude-User
cohere-ai
DuckAssistBot
Diffbot
FacebookBot
Google-Extended
GPTBot
ImagesiftBot
Meta-ExternalAgent
OAI-SearchBot
PerplexityBot
Timpibot
YouBot
Maintain in src/scanner/checks/data/ai-bots.ts. Source: darkvisitors.com. Update quarterly.
Remediation Prompt
Please update my /robots.txt to include explicit rules for AI bots. I want the file to make a considered statement about what AI training / grounding / assistance is allowed on my content.
Add at least these stanzas (keeping my existing rules for traditional crawlers intact):
User-agent: GPTBot
Allow: / # or Disallow: / if I want to block OpenAI
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: Applebot-Extended
Allow: /
User-agent: CCBot
Disallow: / # CCBot builds training datasets sold to LLM vendors — opt-out is common
Choose Allow or Disallow per bot according to my policy. Prefer explicit choices over blanket rules.
Also consider adding a Content-Signal directive (see check 07 — content-signals) for finer-grained control.
Implementation Examples
Same file as check 01; this check asks for specific stanzas inside.
Common Mistakes
- Pretending
User-agent: *is enough — AI vendors often do not read*. - Conflicting rules (
Allow: /+Disallow: /private) ordered incorrectly. Order matters: most specific first. - Typos in UA names:
GTPBot,Cloudbot,PerplexityAI— none match.
Test Fixtures
pass-gptbot-allow.txtpass-blanket-block.txtfail-no-ai-stanza.txtfail-typo-ua.txt— the stanza targetsGTPBot(typo)