Recipes
Docs to llm.txt
Convert any documentation site into a single LLM-ready Markdown file
Distill an entire docs site into one llm.txt you can paste into any LLM context, RAG pipeline, or local model. Useful for unfamiliar libraries, internal wikis, and product docs.
Flow
- Distill the index page with
include: ["links"]to discover all linked URLs - Filter the link list by URL pattern (e.g.
/docs/,/guide/) - Feed the filtered URLs into
/batch/distill - Concatenate the resulting Markdown into a single file
Implementation
import httpx, re
API = "https://openapi.thunderbit.com/openapi/v1"
H = {"Authorization": "Bearer YOUR_API_KEY"}
# 1. Pull the index page + outbound links
index = httpx.post(f"{API}/distill",
headers=H,
json={"url": "https://docs.example.com",
"include": ["links"]}).json()["data"]
# 2. Filter to docs paths
doc_urls = [u for u in index["links"] if re.search(r"/docs/", u)]
# 3. Batch distill
job = httpx.post(f"{API}/batch/distill",
headers=H,
json={"urls": doc_urls}).json()["data"]
# 4. Poll, concatenate
# (poll loop omitted; see RAG Knowledge Base recipe)
with open("llm.txt", "w") as f:
for r in job["results"]:
if r["status"] == "SUCCEEDED":
f.write(f"# {r['url']}\n\n{r['markdown']}\n\n---\n\n")Tips
- Add a size cap —
llm.txtover ~1 MB starts to bloat token budgets - Sort by URL or by section for stable diffs across runs
- Pair with a CI job to keep
llm.txtfresh as the source docs change
Related
- RAG Knowledge Base — same data, vector store instead of flat file
- Distill vs Extract
This recipe is being expanded with chunking and deduplication strategies — check back soon.