aide
bot access

Content Signals

Does `robots.txt` include a `Content-Signal` directive declaring AI training / input / search preferences?

What it is

Content Signals is a 2024 proposal that extends robots.txt with a Content-Signal directive. It separates three independent decisions:

  • ai-train — may this content be used as training data?
  • ai-input — may this content be used as live input to inference (RAG, grounding, assistant context)?
  • search — may this content be indexed for search?

Values are yes / no, comma-separated.

User-agent: *
Content-Signal: search=yes, ai-input=yes, ai-train=no

Why it matters

  • Lets owners make the training/inference distinction that a simple Allow/Disallow cannot.
  • Widely adopted publishers (Reuters, AP, major news sites) already use it.
  • Complements bot-specific rules.

Remediation Prompt

I want to improve my site's agent readiness. Please implement the following fix for Content Signals across our codebase.

Instructions:
Please fix the Content Signals issue on my site so it is agent-ready.

How we test it

Re-use /robots.txt from check 01. Look for Content-Signal: lines inside any User-agent: group.

Pass Warn Fail Matrix

Condition Status Score
≥1 Content-Signal: with at least 2 of the 3 keys explicitly set pass 1.0
Content-Signal: present but only 1 key or values outside {yes,no} warn 0.5
No Content-Signal: anywhere fail 0.0

Sub Tests

id Weight Pass when
directive-present 0.6 At least one valid directive
complete 0.4 All three keys (ai-train, ai-input, search) set explicitly

Remediation Prompt

Please add a Content-Signal directive to my /robots.txt. It should sit under at least the User-agent: * group and declare three AI-related preferences:

    User-agent: *
    Content-Signal: search=<yes|no>, ai-input=<yes|no>, ai-train=<yes|no>

Typical choices for content/publisher sites:
    Content-Signal: search=yes, ai-input=yes, ai-train=no
(Allow search, allow AI-assisted inference on my content, disallow use as training data.)

Typical choices for docs or open projects:
    Content-Signal: search=yes, ai-input=yes, ai-train=yes

You may add Content-Signal under specific bot user-agents too, to vary per bot.
Do not modify existing Allow/Disallow rules.

Implementation Examples

See check 01 — robots.txt examples. Just add the Content-Signal: line.

Common Mistakes

  • Content-Signals: (plural) — spec uses singular.
  • Values other than yes/no (1/0, true/false).
  • Missing commas between key-value pairs.
  • Placing outside a User-agent: group — must be inside.

Test Fixtures

  • pass-all-three.txt
  • warn-only-one-key.txt
  • warn-wrong-value.txt
  • fail-missing.txt