Docs to llm.txt

Distill an entire docs site into one llm.txt you can paste into any LLM context, RAG pipeline, or local model. Useful for unfamiliar libraries, internal wikis, and product docs.

Flow

Distill the index page with include: ["links"] to discover all linked URLs
Filter the link list by URL pattern (e.g. /docs/, /guide/)
Feed the filtered URLs into /batch/distill
Concatenate the resulting Markdown into a single file

Implementation

import httpx, re

API = "https://openapi.thunderbit.com/openapi/v1"
H = {"Authorization": "Bearer YOUR_API_KEY"}

# 1. Pull the index page + outbound links
index = httpx.post(f"{API}/distill",
                   headers=H,
                   json={"url": "https://docs.example.com",
                         "include": ["links"]}).json()["data"]

# 2. Filter to docs paths
doc_urls = [u for u in index["links"] if re.search(r"/docs/", u)]

# 3. Batch distill
job = httpx.post(f"{API}/batch/distill",
                 headers=H,
                 json={"urls": doc_urls}).json()["data"]

# 4. Poll, concatenate
# (poll loop omitted; see RAG Knowledge Base recipe)

with open("llm.txt", "w") as f:
    for r in job["results"]:
        if r["status"] == "SUCCEEDED":
            f.write(f"# {r['url']}\n\n{r['markdown']}\n\n---\n\n")

Tips

Add a size cap — llm.txt over ~1 MB starts to bloat token budgets
Sort by URL or by section for stable diffs across runs
Pair with a CI job to keep llm.txt fresh as the source docs change

RAG Knowledge Base — same data, vector store instead of flat file
Distill vs Extract

This recipe is being expanded with chunking and deduplication strategies — check back soon.

Flow

Implementation

Tips

Related

On this page