LangChain

Klink Thunderbit als Document loader (für RAG-Ingestion) oder Tool (für agentengetriebene Webrecherche) in eine LangChain-Pipeline ein.

Installation

pip install langchain-core httpx

Als Document loader

from langchain_core.documents import Document
import httpx, time

API = "https://openapi.thunderbit.com/openapi/v1"
H = {"Authorization": "Bearer YOUR_API_KEY"}

class ThunderbitLoader:
    def __init__(self, urls: list[str]):
        self.urls = urls

    def load(self) -> list[Document]:
        # 1. submit batch — POST returns {id, status, total, ...}, no `results` yet
        job_id = httpx.post(f"{API}/batch/distill",
                            headers=H,
                            json={"urls": self.urls}).json()["data"]["id"]
        # 2. poll until terminal — see Batch Job Lifecycle guide
        while True:
            data = httpx.get(f"{API}/batch/distill/{job_id}", headers=H).json()["data"]
            if data["status"] in ("COMPLETED", "FAILED", "CANCELLED"):
                break
            time.sleep(5)
        # 3. read results from GET response (each item: {index, url, status, markdown, error})
        return [
            Document(page_content=r["markdown"], metadata={"source": r["url"]})
            for r in data.get("results", []) if r["status"] == "SUCCEEDED"
        ]

docs = ThunderbitLoader(["https://docs.example.com"]).load()

Reiche docs an deinen üblichen LangChain-Text-Splitter und Vector Store weiter.

Als Agent-Tool

from langchain_core.tools import tool

@tool
def read_url(url: str) -> str:
    """Fetch a URL and return clean Markdown for the agent to read.

    Use for any web research task: docs, articles, search results, product pages.
    """
    resp = httpx.post(f"{API}/distill",
                      headers=H,
                      json={"url": url, "renderMode": "basic"},
                      timeout=60.0)
    resp.raise_for_status()
    return resp.json()["data"]["markdown"]

# Pass [read_url] into create_react_agent / AgentExecutor / etc.

Verwandt

Diese Integration wird mit einem langchain-thunderbit-Paket erweitert — schau bald wieder vorbei.

Installation

Als Document loader

Als Agent-Tool

Verwandt

On this page