Docs naar llm.txt

Distill een hele docs-site tot één llm.txt die je kunt plakken in elke LLM-context, RAG-pipeline of lokaal model. Handig voor onbekende libraries, interne wiki's en productdocumentatie.

Flow

Distill de indexpagina met include: ["links"] om alle gelinkte URLs te ontdekken
Filter de linklijst op URL-patroon (bv. /docs/, /guide/)
Voer de gefilterde URLs in /batch/distill
Concateneer de resulterende Markdown tot één bestand

Implementatie

import httpx, re

API = "https://openapi.thunderbit.com/openapi/v1"
H = {"Authorization": "Bearer YOUR_API_KEY"}

# 1. Pull the index page + outbound links
index = httpx.post(f"{API}/distill",
                   headers=H,
                   json={"url": "https://docs.example.com",
                         "include": ["links"]}).json()["data"]

# 2. Filter to docs paths
doc_urls = [u for u in index["links"] if re.search(r"/docs/", u)]

# 3. Batch distill
job = httpx.post(f"{API}/batch/distill",
                 headers=H,
                 json={"urls": doc_urls}).json()["data"]

# 4. Poll, concatenate
# (poll loop omitted; see RAG Knowledge Base recipe)

with open("llm.txt", "w") as f:
    for r in job["results"]:
        if r["status"] == "SUCCEEDED":
            f.write(f"# {r['url']}\n\n{r['markdown']}\n\n---\n\n")

Tips

Voeg een grootte-limiet toe — llm.txt boven ~1 MB belast token-budgets
Sorteer op URL of op sectie voor stabiele diffs tussen runs
Combineer met een CI-taak om llm.txt actueel te houden wanneer de bron-docs veranderen

Gerelateerd

RAG Knowledge Base — dezelfde data, vector store in plaats van plat bestand
Distill vs Extract

Dit recept wordt uitgebreid met chunking- en deduplicatiestrategieën — kom binnenkort terug.

Flow

Implementatie

Tips

Gerelateerd

On this page