llms-full.txt + Markdown Content Negotiation: Caddy/Next.js Kurulumu

llms.txt artık tek dosya değil — llms-full.txt gibi tam içerik sürümlerinin yanı sıra, aynı URL'i ajana markdown, tarayıcıya HTML olarak sunan content negotiation altyapısı gerek. AIDE'nin markdown-negotiation ve llms-full-txt-pass kontrolleri tam olarak bu çift-yöntem servisi test ediyor.

Bağlam: AIDE bu kontrolde tam olarak neye bakıyor?

markdown-negotiation üç yolu da test eder:

Accept: text/markdown header'ı ile aynı URL markdown dönüyor mu?
?format=md query param ile aynı URL markdown dönüyor mu?
Vary: Accept header doğru veriliyor mu (aksi halde CDN cache karışır)?

llms-full-txt-pass ek olarak /llms-full.txt dosyasının (a) HTTP 200 döndüğünü, (b) içerik hash'inin değiştiğini doğrular — yani bu dosya gerçekten dinamik üretiliyor mu yoksa boş mu kalıyor.

Neden iki yol (Accept header + query string)?

Theoride Vary: Accept ile sadece header-bazlı negotiation yeterli. Pratikte:

Cloudflare Free + Pro Vary: Accept'i tam respect etmiyor — cached HTML markdown isteyen ajana gidiyor
Bazı LLM gateway'ler Accept header'ı strip ediyor
Manuel test insan için query string daha kolay

Bu yüzden iki yolu birden tutmak zorunlu.

Caddy ile minimum kurulum (aide.tr örneği)

aide.tr'de canlı kurulum aynen şu şekilde — kopyala-yapıştır kullanılabilir:

aide.tr, www.aide.tr {
    encode gzip zstd

    # 1) Accept header tabanlı negotiation
    @wantsMarkdown {
        header_regexp Accept text/markdown
        path / /tr /tr/ /en /en/
    }
    rewrite @wantsMarkdown /llms-full.txt
    header @wantsMarkdown Vary "Accept, Accept-Encoding"

    # 2) Query string fallback — CF cache key'inde query string olduğu için
    #    /?format=md ve / ayrı cache entry'leri olur, leak yok
    @formatMd {
        query format=md
        path / /tr /tr/ /en /en/
    }
    rewrite @formatMd /llms-full.txt

    # 3) Link header — RFC 8288, AI ajanları bu header'dan alternate
    #    representation'ı tek HEAD çağrısıyla keşfedebiliyor
    header {
        Link "<https://aide.tr/llms-full.txt>; rel=\"alternate\"; type=\"text/markdown\"; title=\"LLM-friendly full content\""
    }

    reverse_proxy web:3000 {
        header_up X-Forwarded-Proto https
    }
}

Üç yapı taşı:

@wantsMarkdown matcher: Accept header'ı text/markdown regex'iyle eşleşiyorsa rewrite edilir, redirect değil. Ajan aynı URL'i istemişti, aynı URL'i alır.
@formatMd matcher: Query string fallback. Bu Caddy'de iki ayrı cache entry doğurur — leak riskini eliminate eder.
Link header: RFC 8288 — rel=alternate ile tek HEAD'le markdown sürümünün varlığını duyurur.

Next.js tarafı: llms-full.txt'i üretmek

Static dosya değil dinamik üretim. app/llms-full.txt/route.ts:

// app/llms-full.txt/route.ts
import { listLearnArticles, listFeatured } from "@/lib/learn"
import { NextResponse } from "next/server"

export const revalidate = 3600  // 1 saat

export async function GET() {
  const [articles, featured] = await Promise.all([
    listLearnArticles("tr"),
    listFeatured(20, "tr"),
  ])

  const lines: string[] = []

  // Üst başlık
  lines.push("# AIDE — AI Detect Engine")
  lines.push("> Siteniz yapay zekâ ajanlarına hazır mı?")
  lines.push("")

  // Featured article'lar — full content
  for (const article of featured) {
    lines.push(`## ${article.title}`)
    lines.push("")
    lines.push(article.body)  // ham markdown
    lines.push("")
    lines.push(`Source: https://aide.tr/learn/${article.slug}`)
    lines.push("---")
    lines.push("")
  }

  // Diğer makaleler — sadece link
  lines.push("## Diğer Makaleler")
  lines.push("")
  for (const article of articles) {
    lines.push(`- [${article.title}](https://aide.tr/learn/${article.slug}) — ${article.description}`)
  }

  const body = lines.join("\n")

  return new NextResponse(body, {
    headers: {
      "Content-Type": "text/markdown; charset=utf-8",
      "Cache-Control": "public, max-age=300, s-maxage=3600",
      "X-Robots-Tag": "noindex",  // Bu URL crawler'lar için, search index'ine girmesin
    },
  })
}

Üç önemli karar:

| Karar | Niçin | |---|---| | revalidate: 3600 | Saat başı yeniden derle — content değişti mi diye kontrol fazla quota yer | | s-maxage=3600 ama max-age=300 | Edge 1 saat tutar; tarayıcı 5 dk — kullanıcı F5 atınca taze görür | | X-Robots-Tag: noindex | Google bu URL'i HTML olarak algılar; index'e girmesin |

llms.txt vs llms-full.txt — fark nedir?

İki dosya birbirini tamamlar:

llms.txt = sitenizin haritaı. Her makaleye link, kısa özet. Uzun değil — 5-50 KB.
llms-full.txt = sitenizin kütüphanesi. Featured içeriğin tam metni + harita. 100 KB - 5 MB arası.

Hangisi öncelikli?

Ajan bilgi soruyorsa → llms-full.txt (cevap orada hazır, ek fetch yok)
Ajan özel bir konu arıyorsa → llms.txt (haritadan ilgili linki bulur, sadece onu fetch eder)

İkisini de yayınlayın. AIDE üçü de değerlendirir:

llms-txt-exists (llms.txt var mı)
llms-full-txt-exists (llms-full.txt var mı)
llms-full-txt-pass (içerik gerçekten dolu mu, sadece placeholder mı)

Yaygın hatalar

| Hata | Belirti | Çözüm | |---|---|---| | Sadece query string negotiation | AIDE'nin markdown-negotiation PASS, ama production'da Cloudflare cache leak | Vary: Accept header'ını da ekle ve test et | | llms-full.txt boş ya da "Coming soon" | llms-full-txt-pass FAIL — content hash boş | Dosyayı build time'da gerçek içerikle doldur, placeholder bırakma | | Content-Type: text/plain | Bazı parser'lar markdown olarak parse etmiyor | text/markdown; charset=utf-8 zorunlu | | Cache TTL 24 saat ama içerik 5 dakika değişiyor | Stale content ajana gidiyor | s-maxage'ı içerik değişim sıklığına bağla, max 1 saat |

Test: AIDE'nin yaptığı şey

# Accept header test
curl -i https://siteniz.com/ -H 'Accept: text/markdown' | head -20
# Beklenti: Content-Type: text/markdown, body bir markdown dökümanı

# Query string test
curl -i 'https://siteniz.com/?format=md' | head -5
# Beklenti: aynı

# Vary header
curl -I https://siteniz.com/ | grep -i vary
# Beklenti: Vary: Accept, Accept-Encoding

# llms-full.txt boyut
curl -s https://siteniz.com/llms-full.txt | wc -c
# Beklenti: ≥10000 byte (placeholder olmamalı)

Üçü de tutuyorsa AIDE taramasında üç ilgili check de PASS gelir.

Production için

Compression: llms-full.txt 1+ MB olabilir; gzip/zstd zorunlu — encode gzip zstd Caddy bloğunda olmalı
Per-locale: TR/EN her ikisi için ayrı dosya. URL'ler /llms-full.txt (default lang) ve /en/llms-full.txt
Sitemaps + llms.txt sync: İki dosya da makale listenizden derlensin — drift olmasın
Analytics: llms-full.txt request'lerini ayrı logla. Hangi ajanlar (UA bazlı) en çok fetch ediyor görmek strateji için kritik

İlgili kaynaklar

llmstxt.org spec
RFC 8288 — Web Linking
Caddy reverse_proxy + matchers
AIDE check detayları: /learn/markdown-negotiation, /learn/llms-full-txt-exists, /learn/llms-full-txt-pass
llms.txt giriş: /learn/llms-txt-nedir
llms.txt yazma rehberi: /learn/llms-txt-rehberi