aide

Scoring System

The final score is a number from 0 to 100. It is the weighted sum of five category sub-scores. Each sub-score is the weighted sum of the checks inside that category.

Anchors

  • 0: site fails every applicable check.
  • 50: meets the basics that have existed for a decade (robots, sitemap, content is plain-reachable HTML with good semantics).
  • 75: ready for today's agents — llms.txt + markdown negotiation + AI bot rules + at least one discoverable capability (MCP card or API catalog or agent skills).
  • 90: exemplary — covers almost every standard, with only emerging/optional ones missing.
  • 100: platinum — every applicable check passes cleanly.

Levels

Levels are derived from the raw score and must be surfaced next to it everywhere.

Score Level Colour
0–49 Unrated gray
50–69 Bronze bronze (#cd7f32)
70–84 Silver silver (#c0c0c0)
85–94 Gold gold (#e6c200)
95–100 Platinum platinum (#b5e3ff)

Level names are also used on badges.

Category weights

Category Weight Rationale
Discoverability 25 If agents can't find it, nothing else matters.
Content 20 Content being readable is the second-biggest unlock.
Bot Access Control 20 Controlling and authenticating bots is increasingly critical.
Capabilities 25 MCP/OAuth/Agent Skills define what an agent can do.
Commerce 10 Still emerging; weight is intentionally modest. Will grow.

Categories sum to 100.

Per-check weights (within category)

Weights are whole integers. Within a category they sum to that category's weight.

Discoverability (25)

Check Weight
robots-txt 10
sitemap-xml 8
link-headers 7

Content (20)

Check Weight
markdown-negotiation 12
llms-txt 6
schema-org 2

Bot Access Control (20)

Check Weight
ai-bot-rules 8
content-signals 7
web-bot-auth 5

Capabilities (25)

Check Weight
mcp-server-card 8
agent-skills 6
api-catalog 5
oauth-authz-server 3
oauth-protected-resource 2
webmcp 1
a2a-agent-card (V1 observational only, weight 0 — still surfaced)

Commerce (10)

Check Weight
x402 4
ucp 3
acp 3

Extras (informational, weight 0 — surfaced but don't move the score)

  • openapi
  • rss-atom
  • security-txt
  • humans-txt

Reasoning: extras are nice signals but we don't want to inflate against the baseline tool.

Applicability (profiles)

A check is skipped (not_applicable) when it doesn't apply to the selected profile.

Profile Description What's included
all (default) Everything Every check
content Static / publishing sites Discoverability + Content + Bot Access + oauth-*(off), mcp-server-card(off), api-catalog(off), webmcp(off), a2a-*(off), all Commerce off
site General web app Everything except Commerce
api API/backend service Discoverability + Capabilities + web-bot-auth. Content checks off.

When a check is not_applicable, its weight is removed from the denominator for that scan — the score is rescaled so users don't get penalised for skipping irrelevant checks.

Formula

for each category C:
  applicableWeightC = sum(check.weight for check in C if status != not_applicable)
  earnedC           = sum(check.weight * check.score for check in C if status != not_applicable)
  if applicableWeightC == 0:
    categoryScoreC = null (not shown)
  else:
    categoryScoreC = round(100 * earnedC / applicableWeightC)

totalApplicableWeight = sum(applicableWeightC for each C)
totalEarned           = sum(earnedC for each C)
finalScore            = round(100 * totalEarned / totalApplicableWeight)

check.score is a float in [0..1] — most checks are 0 or 1; checks with sub-tests produce partial credit (e.g. llms-txt gives 0.5 for existence + 0.5 for being well-formed).

Status mapping

Status Contributes check.score Visible
pass 1.0 ✓ green
warn 0.5 (or check-defined partial) ⚠ amber
fail 0.0 ✕ rose
not_applicable — (excluded) — gray

Sub-result credit

Checks with multiple sub-tests define their own partial scoring in their docs/checks/<id>.md file under a Scoring section. The engine trusts that value.

Example — robots-txt:

Sub-test Weight within check
Exists (200 OK, text/plain) 0.4
Points to at least one Sitemap 0.3
Has AI bot rules or Content-Signal 0.3

If the file exists, no sitemap, but has AI rules → 0.4 + 0 + 0.3 = 0.7.

Why the incumbent's weights don't apply here

The baseline tool (isitagentready.com) intentionally keeps Commerce out of the score (weight 0). We include it (weight 10) because we believe site owners want commerce progress to show on their badge. This is a deliberate difference and is documented on /learn/scoring.

Tie-breakers on the leaderboard

When two sites have the same integer score:

  1. Higher raw (non-rounded) score
  2. Most recent last_scanned
  3. Alphabetical host

Changing weights later

Weights are stored in src/scoring/weights.ts. A scan stores the weight it used per check in check_results.weight. This means:

  • Old scans keep their old score (historical stability)
  • New scans use the new weights
  • Leaderboard always uses sites.latest_score (rescanned under new weights the next time)
  • We announce weight changes in the changelog and a /learn/scoring diff table

Scoring edge cases (explicit)

  • Site blocks our UA: return error with code blocked — do not score.
  • Site returns 5xx on homepage: error — do not score.
  • Check's probed URL is 404: the check decides; most return fail, web-bot-auth returns not_applicable (endpoint optional).
  • HTTPS only with invalid cert: fail the scan at ingest with tls_invalid unless user opts in via checkbox (not in V1).
  • Redirects off-origin: we follow up to 5; if the final host differs significantly (etld1 changes), we note it but still scan the final URL. The scan record stores both requested_url and the canonical URL.