Scoring System

The final score is a number from 0 to 100. It is the weighted sum of five category sub-scores. Each sub-score is the weighted sum of the checks inside that category.

Anchors

0: site fails every applicable check.
50: meets the basics that have existed for a decade (robots, sitemap, content is plain-reachable HTML with good semantics).
75: ready for today's agents — llms.txt + markdown negotiation + AI bot rules + at least one discoverable capability (MCP card or API catalog or agent skills).
90: exemplary — covers almost every standard, with only emerging/optional ones missing.
100: platinum — every applicable check passes cleanly.

Levels

Levels are derived from the raw score and must be surfaced next to it everywhere.

Score	Level	Colour
0–49	Unrated	gray
50–69	Bronze	bronze (`#cd7f32`)
70–84	Silver	silver (`#c0c0c0`)
85–94	Gold	gold (`#e6c200`)
95–100	Platinum	platinum (`#b5e3ff`)

Level names are also used on badges.

Category weights

Category	Weight	Rationale
Discoverability	25	If agents can't find it, nothing else matters.
Content	20	Content being readable is the second-biggest unlock.
Bot Access Control	20	Controlling and authenticating bots is increasingly critical.
Capabilities	25	MCP/OAuth/Agent Skills define what an agent can do.
Commerce	10	Still emerging; weight is intentionally modest. Will grow.

Categories sum to 100.

Per-check weights (within category)

Weights are whole integers. Within a category they sum to that category's weight.

Discoverability (25)

Check	Weight
`robots-txt`	10
`sitemap-xml`	8
`link-headers`	7

Content (20)

Check	Weight
`markdown-negotiation`	12
`llms-txt`	6
`schema-org`	2

Bot Access Control (20)

Check	Weight
`ai-bot-rules`	8
`content-signals`	7
`web-bot-auth`	5

Capabilities (25)

Check	Weight
`mcp-server-card`	8
`agent-skills`	6
`api-catalog`	5
`oauth-authz-server`	3
`oauth-protected-resource`	2
`webmcp`	1
`a2a-agent-card`	(V1 observational only, weight 0 — still surfaced)

Commerce (10)

Check	Weight
`x402`	4
`ucp`	3
`acp`	3

Extras (informational, weight 0 — surfaced but don't move the score)

openapi
rss-atom
security-txt
humans-txt

Reasoning: extras are nice signals but we don't want to inflate against the baseline tool.

Applicability (profiles)

A check is skipped (not_applicable) when it doesn't apply to the selected profile.

Profile	Description	What's included
`all` (default)	Everything	Every check
`content`	Static / publishing sites	Discoverability + Content + Bot Access + `oauth-`(off), `mcp-server-card`(off), `api-catalog`(off), `webmcp`(off), `a2a-`(off), all Commerce off
`site`	General web app	Everything except Commerce
`api`	API/backend service	Discoverability + Capabilities + `web-bot-auth`. Content checks off.

When a check is not_applicable, its weight is removed from the denominator for that scan — the score is rescaled so users don't get penalised for skipping irrelevant checks.

Formula

for each category C:
  applicableWeightC = sum(check.weight for check in C if status != not_applicable)
  earnedC           = sum(check.weight * check.score for check in C if status != not_applicable)
  if applicableWeightC == 0:
    categoryScoreC = null (not shown)
  else:
    categoryScoreC = round(100 * earnedC / applicableWeightC)

totalApplicableWeight = sum(applicableWeightC for each C)
totalEarned           = sum(earnedC for each C)
finalScore            = round(100 * totalEarned / totalApplicableWeight)

check.score is a float in [0..1] — most checks are 0 or 1; checks with sub-tests produce partial credit (e.g. llms-txt gives 0.5 for existence + 0.5 for being well-formed).

Status mapping

Status	Contributes `check.score`	Visible
`pass`	1.0	✓ green
`warn`	0.5 (or check-defined partial)	⚠ amber
`fail`	0.0	✕ rose
`not_applicable`	— (excluded)	— gray

Sub-result credit

Checks with multiple sub-tests define their own partial scoring in their docs/checks/<id>.md file under a Scoring section. The engine trusts that value.

Example — robots-txt:

Sub-test	Weight within check
Exists (200 OK, `text/plain`)	0.4
Points to at least one Sitemap	0.3
Has AI bot rules or Content-Signal	0.3

If the file exists, no sitemap, but has AI rules → 0.4 + 0 + 0.3 = 0.7.

Why the incumbent's weights don't apply here

The baseline tool (isitagentready.com) intentionally keeps Commerce out of the score (weight 0). We include it (weight 10) because we believe site owners want commerce progress to show on their badge. This is a deliberate difference and is documented on /learn/scoring.

Tie-breakers on the leaderboard

When two sites have the same integer score:

Higher raw (non-rounded) score
Most recent last_scanned
Alphabetical host

Changing weights later

Weights are stored in src/scoring/weights.ts. A scan stores the weight it used per check in check_results.weight. This means:

Old scans keep their old score (historical stability)
New scans use the new weights
Leaderboard always uses sites.latest_score (rescanned under new weights the next time)
We announce weight changes in the changelog and a /learn/scoring diff table

Scoring edge cases (explicit)

Site blocks our UA: return error with code blocked — do not score.
Site returns 5xx on homepage: error — do not score.
Check's probed URL is 404: the check decides; most return fail, web-bot-auth returns not_applicable (endpoint optional).
HTTPS only with invalid cert: fail the scan at ingest with tls_invalid unless user opts in via checkbox (not in V1).
Redirects off-origin: we follow up to 5; if the final host differs significantly (etld1 changes), we note it but still scan the final URL. The scan record stores both requested_url and the canonical URL.