把文件站轉成 llm.txt

把整個文件站蒸餾成一份 llm.txt，貼進任何 LLM context、RAG pipeline 或本地模型都能直接用。對不熟的函式庫、內部 wiki、產品文件特別有用。

流程

用 include: ["links"] 蒸餾首頁，收集所有對外連結
以 URL pattern 過濾連結清單（例如 /docs/、/guide/）
把過濾後的 URL 送進 /batch/distill
將產出的 Markdown 串接成單一檔案

實作

import httpx, re

API = "https://openapi.thunderbit.com/openapi/v1"
H = {"Authorization": "Bearer YOUR_API_KEY"}

# 1. Pull the index page + outbound links
index = httpx.post(f"{API}/distill",
                   headers=H,
                   json={"url": "https://docs.example.com",
                         "include": ["links"]}).json()["data"]

# 2. Filter to docs paths
doc_urls = [u for u in index["links"] if re.search(r"/docs/", u)]

# 3. Batch distill
job = httpx.post(f"{API}/batch/distill",
                 headers=H,
                 json={"urls": doc_urls}).json()["data"]

# 4. Poll, concatenate
# (poll loop omitted; see RAG Knowledge Base recipe)

with open("llm.txt", "w") as f:
    for r in job["results"]:
        if r["status"] == "SUCCEEDED":
            f.write(f"# {r['url']}\n\n{r['markdown']}\n\n---\n\n")

小技巧

加上大小上限 —— llm.txt 超過約 1 MB 後就會吃掉太多 token 預算
依 URL 或章節排序，讓多次跑的 diff 穩定
搭配 CI 任務，原始文件變動時自動更新 llm.txt

把文件站轉成 llm.txt

流程

實作

小技巧

相關

目錄