Recipes

Docs to llm.txt

Convert any documentation site into a single LLM-ready Markdown file

Distill an entire docs site into one llm.txt you can paste into any LLM context, RAG pipeline, or local model. Useful for unfamiliar libraries, internal wikis, and product docs.

Flow

  1. Distill the index page with include: ["links"] to discover all linked URLs
  2. Filter the link list by URL pattern (e.g. /docs/, /guide/)
  3. Feed the filtered URLs into /batch/distill
  4. Concatenate the resulting Markdown into a single file

Implementation

import httpx, re

API = "https://openapi.thunderbit.com/openapi/v1"
H = {"Authorization": "Bearer YOUR_API_KEY"}

# 1. Pull the index page + outbound links
index = httpx.post(f"{API}/distill",
                   headers=H,
                   json={"url": "https://docs.example.com",
                         "include": ["links"]}).json()["data"]

# 2. Filter to docs paths
doc_urls = [u for u in index["links"] if re.search(r"/docs/", u)]

# 3. Batch distill
job = httpx.post(f"{API}/batch/distill",
                 headers=H,
                 json={"urls": doc_urls}).json()["data"]

# 4. Poll, concatenate
# (poll loop omitted; see RAG Knowledge Base recipe)

with open("llm.txt", "w") as f:
    for r in job["results"]:
        if r["status"] == "SUCCEEDED":
            f.write(f"# {r['url']}\n\n{r['markdown']}\n\n---\n\n")

Tips

  • Add a size cap — llm.txt over ~1 MB starts to bloat token budgets
  • Sort by URL or by section for stable diffs across runs
  • Pair with a CI job to keep llm.txt fresh as the source docs change

This recipe is being expanded with chunking and deduplication strategies — check back soon.