整合
LangChain
把 Thunderbit 當作 Document loader 或 tool 嵌進 LangChain agent
把 Thunderbit 塞進 LangChain pipeline:當 Document loader(做 RAG 入庫),或當 Tool(讓 agent 自己上網查資料)。
安裝
pip install langchain-core httpx當 Document loader
from langchain_core.documents import Document
import httpx, time
API = "https://openapi.thunderbit.com/openapi/v1"
H = {"Authorization": "Bearer YOUR_API_KEY"}
class ThunderbitLoader:
def __init__(self, urls: list[str]):
self.urls = urls
def load(self) -> list[Document]:
# 1. submit batch — POST returns {id, status, total, ...}, no `results` yet
job_id = httpx.post(f"{API}/batch/distill",
headers=H,
json={"urls": self.urls}).json()["data"]["id"]
# 2. poll until terminal — see Batch Job Lifecycle guide
while True:
data = httpx.get(f"{API}/batch/distill/{job_id}", headers=H).json()["data"]
if data["status"] in ("COMPLETED", "FAILED", "CANCELLED"):
break
time.sleep(5)
# 3. read results from GET response (each item: {index, url, status, markdown, error})
return [
Document(page_content=r["markdown"], metadata={"source": r["url"]})
for r in data.get("results", []) if r["status"] == "SUCCEEDED"
]
docs = ThunderbitLoader(["https://docs.example.com"]).load()把 docs 接到你平常用的 LangChain text splitter + vector store 就行。
當 agent tool
from langchain_core.tools import tool
@tool
def read_url(url: str) -> str:
"""Fetch a URL and return clean Markdown for the agent to read.
Use for any web research task: docs, articles, search results, product pages.
"""
resp = httpx.post(f"{API}/distill",
headers=H,
json={"url": url, "renderMode": "basic"},
timeout=60.0)
resp.raise_for_status()
return resp.json()["data"]["markdown"]
# Pass [read_url] into create_react_agent / AgentExecutor / etc.相關文件
這份整合會擴成 langchain-thunderbit 套件 —— 敬請期待。