Content Signals — how to make your site agent-ready

What it is

Content Signals is a 2024 proposal that extends robots.txt with a Content-Signal directive. It separates three independent decisions:

ai-train — may this content be used as training data?
ai-input — may this content be used as live input to inference (RAG, grounding, assistant context)?
search — may this content be indexed for search?

Values are yes / no, comma-separated.

User-agent: *
Content-Signal: search=yes, ai-input=yes, ai-train=no

Why it matters

Lets owners make the training/inference distinction that a simple Allow/Disallow cannot.
Widely adopted publishers (Reuters, AP, major news sites) already use it.
Complements bot-specific rules.

Remediation Prompt

I want to improve my site's agent readiness. Please implement the following fix for Content Signals across our codebase.

Instructions:
Please fix the Content Signals issue on my site so it is agent-ready.

How we test it

Re-use /robots.txt from check 01. Look for Content-Signal: lines inside any User-agent: group.

Pass Warn Fail Matrix

Condition	Status	Score
≥1 `Content-Signal:` with at least 2 of the 3 keys explicitly set	pass	1.0
`Content-Signal:` present but only 1 key or values outside `{yes,no}`	warn	0.5
No `Content-Signal:` anywhere	fail	0.0

Sub Tests

id	Weight	Pass when
`directive-present`	0.6	At least one valid directive
`complete`	0.4	All three keys (`ai-train`, `ai-input`, `search`) set explicitly

Remediation Prompt

Please add a Content-Signal directive to my /robots.txt. It should sit under at least the User-agent: * group and declare three AI-related preferences:

    User-agent: *
    Content-Signal: search=<yes|no>, ai-input=<yes|no>, ai-train=<yes|no>

Typical choices for content/publisher sites:
    Content-Signal: search=yes, ai-input=yes, ai-train=no
(Allow search, allow AI-assisted inference on my content, disallow use as training data.)

Typical choices for docs or open projects:
    Content-Signal: search=yes, ai-input=yes, ai-train=yes

You may add Content-Signal under specific bot user-agents too, to vary per bot.
Do not modify existing Allow/Disallow rules.

Implementation Examples

See check 01 — robots.txt examples. Just add the Content-Signal: line.

Common Mistakes

Content-Signals: (plural) — spec uses singular.
Values other than yes/no (1/0, true/false).
Missing commas between key-value pairs.
Placing outside a User-agent: group — must be inside.

References

Test Fixtures

pass-all-three.txt
warn-only-one-key.txt
warn-wrong-value.txt
fail-missing.txt

← Previous Check

API Catalog (RFC 9727)

Next Check →

humans.txt