The final score is a number from 0 to 100. It is the weighted sum of five category sub-scores. Each sub-score is the weighted sum of the checks inside that category.
Anchors
- 0: site fails every applicable check.
- 50: meets the basics that have existed for a decade (robots, sitemap, content is plain-reachable HTML with good semantics).
- 75: ready for today's agents — llms.txt + markdown negotiation + AI bot rules + at least one discoverable capability (MCP card or API catalog or agent skills).
- 90: exemplary — covers almost every standard, with only emerging/optional ones missing.
- 100: platinum — every applicable check passes cleanly.
Levels
Levels are derived from the raw score and must be surfaced next to it everywhere.
| Score | Level | Colour |
|---|---|---|
| 0–49 | Unrated | gray |
| 50–69 | Bronze | bronze (#cd7f32) |
| 70–84 | Silver | silver (#c0c0c0) |
| 85–94 | Gold | gold (#e6c200) |
| 95–100 | Platinum | platinum (#b5e3ff) |
Level names are also used on badges.
Category weights
| Category | Weight | Rationale |
|---|---|---|
| Discoverability | 25 | If agents can't find it, nothing else matters. |
| Content | 20 | Content being readable is the second-biggest unlock. |
| Bot Access Control | 20 | Controlling and authenticating bots is increasingly critical. |
| Capabilities | 25 | MCP/OAuth/Agent Skills define what an agent can do. |
| Commerce | 10 | Still emerging; weight is intentionally modest. Will grow. |
Categories sum to 100.
Per-check weights (within category)
Weights are whole integers. Within a category they sum to that category's weight.
Discoverability (25)
| Check | Weight |
|---|---|
robots-txt |
10 |
sitemap-xml |
8 |
link-headers |
7 |
Content (20)
| Check | Weight |
|---|---|
markdown-negotiation |
12 |
llms-txt |
6 |
schema-org |
2 |
Bot Access Control (20)
| Check | Weight |
|---|---|
ai-bot-rules |
8 |
content-signals |
7 |
web-bot-auth |
5 |
Capabilities (25)
| Check | Weight |
|---|---|
mcp-server-card |
8 |
agent-skills |
6 |
api-catalog |
5 |
oauth-authz-server |
3 |
oauth-protected-resource |
2 |
webmcp |
1 |
a2a-agent-card |
(V1 observational only, weight 0 — still surfaced) |
Commerce (10)
| Check | Weight |
|---|---|
x402 |
4 |
ucp |
3 |
acp |
3 |
Extras (informational, weight 0 — surfaced but don't move the score)
openapirss-atomsecurity-txthumans-txt
Reasoning: extras are nice signals but we don't want to inflate against the baseline tool.
Applicability (profiles)
A check is skipped (not_applicable) when it doesn't apply to the selected profile.
| Profile | Description | What's included |
|---|---|---|
all (default) |
Everything | Every check |
content |
Static / publishing sites | Discoverability + Content + Bot Access + oauth-*(off), mcp-server-card(off), api-catalog(off), webmcp(off), a2a-*(off), all Commerce off |
site |
General web app | Everything except Commerce |
api |
API/backend service | Discoverability + Capabilities + web-bot-auth. Content checks off. |
When a check is not_applicable, its weight is removed from the denominator for that scan — the score is rescaled so users don't get penalised for skipping irrelevant checks.
Formula
for each category C:
applicableWeightC = sum(check.weight for check in C if status != not_applicable)
earnedC = sum(check.weight * check.score for check in C if status != not_applicable)
if applicableWeightC == 0:
categoryScoreC = null (not shown)
else:
categoryScoreC = round(100 * earnedC / applicableWeightC)
totalApplicableWeight = sum(applicableWeightC for each C)
totalEarned = sum(earnedC for each C)
finalScore = round(100 * totalEarned / totalApplicableWeight)
check.score is a float in [0..1] — most checks are 0 or 1; checks with sub-tests produce partial credit (e.g. llms-txt gives 0.5 for existence + 0.5 for being well-formed).
Status mapping
| Status | Contributes check.score |
Visible |
|---|---|---|
pass |
1.0 | ✓ green |
warn |
0.5 (or check-defined partial) | ⚠ amber |
fail |
0.0 | ✕ rose |
not_applicable |
— (excluded) | — gray |
Sub-result credit
Checks with multiple sub-tests define their own partial scoring in their docs/checks/<id>.md file under a Scoring section. The engine trusts that value.
Example — robots-txt:
| Sub-test | Weight within check |
|---|---|
Exists (200 OK, text/plain) |
0.4 |
| Points to at least one Sitemap | 0.3 |
| Has AI bot rules or Content-Signal | 0.3 |
If the file exists, no sitemap, but has AI rules → 0.4 + 0 + 0.3 = 0.7.
Why the incumbent's weights don't apply here
The baseline tool (isitagentready.com) intentionally keeps Commerce out of the score (weight 0). We include it (weight 10) because we believe site owners want commerce progress to show on their badge. This is a deliberate difference and is documented on /learn/scoring.
Tie-breakers on the leaderboard
When two sites have the same integer score:
- Higher raw (non-rounded) score
- Most recent
last_scanned - Alphabetical host
Changing weights later
Weights are stored in src/scoring/weights.ts. A scan stores the weight it used per check in check_results.weight. This means:
- Old scans keep their old score (historical stability)
- New scans use the new weights
- Leaderboard always uses
sites.latest_score(rescanned under new weights the next time) - We announce weight changes in the changelog and a
/learn/scoringdiff table
Scoring edge cases (explicit)
- Site blocks our UA: return
errorwith codeblocked— do not score. - Site returns 5xx on homepage:
error— do not score. - Check's probed URL is 404: the check decides; most return
fail,web-bot-authreturnsnot_applicable(endpoint optional). - HTTPS only with invalid cert: fail the scan at ingest with
tls_invalidunless user opts in via checkbox (not in V1). - Redirects off-origin: we follow up to 5; if the final host differs significantly (
etld1changes), we note it but still scan the final URL. The scan record stores bothrequested_urland the canonical URL.