superlinked · svonava · Jun 23, 2026
diff --git a/examples/README.md b/examples/README.md
@@ -20,6 +20,7 @@ service keys.
 | [Swap an OCR model with one identifier change](./document-ocr) | Driving recognition (VLM-OCR), structured extraction (Donut), and zero-shot NER (GLiNER) through the same `extract` call by swapping the model ID | `extract` | Docker Compose plus Node UI, no API key required, hosted version on [Hugging Face Spaces](https://huggingface.co/spaces/superlinked/document-ocr) | Runnable demo |
 | [A Stripe Link checkout with an SIE fraud-risk gate](./stripe-link-fraud) | Wiring all three SIE primitives into a pre-authorization fraud-risk gate that runs in the same round-trip as the Stripe PaymentIntent | `extract`, `encode`, `score` | Docker Compose plus Node UI; Stripe test-mode keys optional (runs in mock mode without them) | Runnable demo |
 | [Vision-first document RAG](./vision-doc-rag) | Retrieving and answering questions over a multi-tenant page corpus by looking at page images — including scanned drawings — with OCR kept out of the score path | `encode`, `chat/completions`, `score` (optional) | GPU SIE deployment required: ColQwen2.5 retriever + Qwen3.5-4B answer model (runs on the generation bundle) | Runnable demo |
+| [Multi-model contract review with the OpenAI Agents SDK](./contract-review-agent) | Running an OpenAI Agents SDK agent whose every model call — triage, orchestration, vision, OCR, embeddings, rerank, entity extraction, text-to-SQL, reasoning, and a safety guardrail — is served by one SIE cluster, each step on the right catalog model, with per-model observability | `chat/completions`, `encode`, `score`, `extract` | GPU SIE deployment required; standalone `uv` project; real contracts fetched from CUAD (CC BY 4.0) | Runnable demo |
 
 For docs publishing, lead with the quickest runnable demos, then use the
 benchmark and evaluation examples for deeper technical users.

diff --git a/examples/contract-review-agent/.env.example b/examples/contract-review-agent/.env.example
@@ -0,0 +1,4 @@
+# Point these at any SIE deployment that serves /v1/chat/completions (a GPU
+# cluster — see README "Run it"). Defaults match a local CUDA container.
+SIE_CLUSTER_URL=http://localhost:8080
+SIE_API_KEY=
diff --git a/examples/contract-review-agent/.gitignore b/examples/contract-review-agent/.gitignore
@@ -0,0 +1,8 @@
+.venv/
+__pycache__/
+.env
+*.log
+uv.lock
+.ruff_cache/
+# Generated sample artifacts (recreate with `uv run make-sample`)
+contract_review_agent/data/generated/
diff --git a/examples/contract-review-agent/README.md b/examples/contract-review-agent/README.md
@@ -0,0 +1,111 @@
+# Contract review with the OpenAI Agents SDK, on one SIE cluster
+
+A multi-agent contract reviewer built with the [OpenAI Agents SDK](https://openai.github.io/openai-agents-python/) where **every model call is served by SIE** — no `api.openai.com`, no per-token bill. An **investigator** agent autonomously calls tools to gather grounded facts, then a **synthesizer** agent turns them into a structured review — each step running on the **right model from the SIE catalog**: a fast triage model, a vision model that reads the scanned signature page, a reasoning sub-agent for clause risk, a text-to-SQL specialist, an OCR model, embedding + reranker models for clause search, a zero-shot entity extractor, and a safety guardrail. Ten specialized jobs, one cluster, one request.
+
+This is the "one cluster powers every model your agent calls" idea from the [SIE landing page](https://superlinked.com), made real and runnable.
+
+## The catalog: the right model for each job
+
+Every value below is a real model in the [SIE catalog](https://superlinked.com/models). Swap any line in `config.yaml` to try another — nothing else changes.
+
+| Role in the agent | SIE model | SIE function |
+|---|---|---|
+| Triage — classify the document type | `Qwen/Qwen3-0.6B` | chat |
+| **Orchestrator** — plan, call tools, assemble the review | `Qwen/Qwen3-4B-Instruct-2507` (alias `code`) | chat + tools + JSON schema |
+| Vision — read the scanned signature page | `Qwen/Qwen3.5-4B` | chat + image |
+| Reasoning sub-agent — clause-risk analysis | `Qwen/Qwen3-4B-Instruct-2507` (↑ `Qwen3.5-4B` / `Qwen3.6-27B` where served) | chat |
+| Text-to-SQL — query the obligations DB | `defog/sqlcoder-7b-2` | completions |
+| Guardrail — safety / prompt-injection | `ibm-granite/granite-guardian-3.0-2b` (alias `guard`) | chat |
+| OCR — scanned page → markdown | `lightonai/LightOnOCR-2-1B` | extract |
+| Clause search — dense embeddings | `BAAI/bge-m3` | encode |
+| Clause rerank — cross-encoder | `Qwen/Qwen3-Reranker-4B` | score |
+| Entity extraction — parties, dates, amounts | `urchade/gliner_large-v2.1` | extract |
+
+## How it works
+
+The whole trick is one idea: **the Agents SDK speaks the OpenAI wire protocol, and SIE serves an OpenAI-compatible `/v1` endpoint.** So we point the SDK at SIE and force chat completions (`contract_review_agent/runtime.py`):
+
+```python
+client = AsyncOpenAI(base_url="http://localhost:8080/v1", api_key="not-needed")
+set_default_openai_client(client)        # every agent talks to SIE...
+set_default_openai_api("chat_completions")  # ...over chat completions, not the Responses API...
+set_tracing_disabled(True)               # ...and we never phone home with traces.
+```
+
+After that, each `Agent` just names the SIE model it should run on:
+
+```python
+Agent(name="Risk Analyst", model=OpenAIChatCompletionsModel("Qwen/Qwen3-4B-Instruct-2507", openai_client=client), ...)
+```
+
+The flow is **two agents** (which is what keeps a small open model reliable):
+
+1. An **investigator** (on `Qwen3-4B-Instruct`) with seven tools and **no** structured `output_type` — so it can't short-circuit to a hallucinated answer and instead must call tools to learn anything about the contract:
+   - `classify_document` (triage) · `read_signature_page` (vision) · `analyze_clause_risks` (delegates to the reasoning **sub-agent**) — generative LLMs
+   - `ocr_signature_page` · `extract_entities` (`extract`), `search_clauses` (`encode` + `score`), `query_obligations_db` (`completions`) — retrieval & extraction
+   - a `granite-guardian` **input guardrail** screens the request first (and fails open, logged, if the guard model is unavailable).
+2. A **synthesizer** (structured `output_type=ContractReview`, no tools) turns the investigator's grounded findings into the final review — parties, dates, governing law, executed?, key obligations, risk flags with severity + redlines, recommendation — via SIE's JSON-schema-constrained generation.
+
+> Why two agents? With a structured `output_type`, a small model tends to emit the schema immediately and skip the tools (it will even hallucinate the fields). Splitting "gather with tools" from "format the result" keeps the fan-out real and the output grounded.
+
+## Run it
+
+You need Python 3.12 and a **GPU-backed SIE deployment** — the generative models run on SIE's generation bundle (CUDA), so the `latest-cpu-default` image can't serve them.
+
+```bash
+# 1. SIE on a local NVIDIA GPU, or point SIE_CLUSTER_URL / SIE_API_KEY at a managed GPU cluster.
+docker run --gpus all -p 8080:8080 -v sie-hf-cache:/app/.cache/huggingface \
+  ghcr.io/superlinked/sie-server:latest-cuda12-default
+
+cd examples/contract-review-agent
+cp .env.example .env          # edit SIE_CLUSTER_URL / SIE_API_KEY if not localhost
+uv sync
+
+# 2. Fetch a handful of real contracts from CUAD (CC BY 4.0). Downloads a ~18 MB archive once.
+uv run fetch-contracts                 # or: uv run make-sample  (offline synthetic contracts)
+
+# 3. Review the first contract and watch the model fan-out.
+uv run review                          # uv run review --list   to see available contracts
+uv run review --contract <slug>        # review a specific one
+```
+
+> **GPU sizing.** `reasoning` defaults to `Qwen/Qwen3-4B-Instruct-2507` (reliable, fast) so the demo
+> runs on a single mid-size GPU; swap in the newer `Qwen/Qwen3.5-4B` or the stronger `Qwen/Qwen3.6-27B` (H100/RTX PRO 6000) where the cluster serves them. A cold
+> cluster pays a one-time load per model on first use; the agent retries the "still
+> provisioning" responses under `cluster.provision_timeout_s`. Keep bundles warm
+> (`minReplicas: 1`) to skip the wait — and any model the cluster can't serve degrades
+> gracefully (logged in the ledger) instead of failing the run.
+
+## What you'll see
+
+`uv run review` prints the model catalog, runs the agent, then prints the structured review **plus a per-model observability ledger** — each step's model, SIE function, **cold-start warm-up**, warm latency, data sent, and **warm throughput (tokens/s)** — so you can watch one cluster fan a single request across the catalog and see how each model performed. (Warm-up is shown separately from throughput for the generative calls; the `encode`/`score`/`extract` calls go through the SIE SDK, which provisions internally, so those show total latency.) Try `--instruction "..."` to change the ask, or feed the guardrail a malicious prompt to watch `granite-guardian` trip the tripwire.
+
+## Swapping models (the point of the catalog)
+
+`config.yaml` maps each role to a model id. Change a string, rerun — no code edits:
+
+```yaml
+models:
+  reasoning: "Qwen/Qwen3.6-27B"               # default 4B runs anywhere; bump to 27B on an H100-class cluster
+  ocr: "opendatalab/MinerU2.5-Pro-2604-1.2B"  # try a different OCR model
+```
+
+Alternatively, resolve roles **server-side** with SIE's gateway aliases — set
+`SIE_GATEWAY_MODEL_ALIASES='{"vision":"Qwen/Qwen3.5-4B","ocr":"lightonai/LightOnOCR-2-1B"}'`
+and reference `vision` / `ocr` (the built-ins `code`, `sql`, `guard` already ship).
+
+## Data
+
+The default corpus is **[CUAD](https://www.atticusprojectai.org/cuad/)** (Contract Understanding Atticus Dataset) — 510 real commercial contracts filed with the SEC, released by The Atticus Project under **CC BY 4.0**. `fetch-contracts` downloads CUAD's ~18 MB archive once (from the [Atticus Project repo](https://github.com/TheAtticusProject/cuad)), parses the SQuAD-format contract text, writes a curated handful as the corpus, renders one page to an image for the OCR/vision step, and seeds a small SQLite obligations database that references the contracts pulled.
+
+> CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review. Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball. arXiv:2103.06268. Licensed CC BY 4.0.
+
+`uv run make-sample` builds a fully synthetic, offline alternative (an Acme MSA, an NDA, and an SOW) so the demo runs with no network.
+
+## Notes
+
+- Chat completions, tool calling, JSON-schema structured output, vision, and `/v1/completions` (for `sqlcoder`) are all served over SIE's OpenAI-compatible API.
+- `sqlcoder-7b-2` is a completion model used with its native text-to-SQL template; for higher accuracy you can instead point `sql` at the `code`-aliased instruct model.
+- This is a demo of inference orchestration, **not legal advice**.
+
+Apache-2.0, like the rest of SIE.
diff --git a/examples/contract-review-agent/config.yaml b/examples/contract-review-agent/config.yaml
@@ -0,0 +1,36 @@
+# SIE deployment. /v1/chat/completions is served by SIE's generation bundle
+# (GPU), so use a CUDA image locally:
+#   docker run --gpus all -p 8080:8080 ghcr.io/superlinked/sie-server:latest-cuda12-default
+# or override with SIE_CLUSTER_URL / SIE_API_KEY to target a managed GPU cluster.
+cluster:
+  url: "http://localhost:8080"
+  api_key: ""
+  gpu: ""                  # only set for managed multi-GPU clusters (e.g. "l4-spot"); ignored locally
+  provision_timeout_s: 900 # cold clusters scale from zero — first call to each model pays a load
+
+# ── The heart of the demo: one cluster, many models — the right one for each job. ──
+# Every value is a real model in the SIE catalog (https://superlinked.com/models).
+# Swap any line to try a different model; no other code changes.
+models:
+  # Generative LLMs — the agent "brains" (chat / tools / structured output):
+  triage: "Qwen/Qwen3-0.6B"                          # fast, cheap doc-type classifier (no tools)
+  orchestrator: "Qwen/Qwen3-4B-Instruct-2507"        # plans, calls tools, assembles output (alias: code)
+  vision: "Qwen/Qwen3.5-4B"                          # reads scanned / signature pages (text + image)
+  reasoning: "Qwen/Qwen3-4B-Instruct-2507"           # clause-risk analysis (reliable, fast); swap to newer Qwen/Qwen3.5-4B or stronger Qwen/Qwen3.6-27B where the cluster serves them
+  sql: "defog/sqlcoder-7b-2"                         # text-to-SQL specialist (completions + native template)
+  guard: "ibm-granite/granite-guardian-3.0-2b"       # safety / prompt-injection guardrail (alias: guard)
+  # Retrieval + extraction tools (encode / score / extract):
+  ocr: "lightonai/LightOnOCR-2-1B"                   # scanned PDF / image -> markdown (latest OCR model)
+  embed: "BAAI/bge-m3"                               # dense embeddings for clause search
+  rerank: "Qwen/Qwen3-Reranker-4B"                   # cross-encoder rerank of retrieved clauses
+  entities: "urchade/gliner_large-v2.1"              # zero-shot entity extraction (parties, dates, amounts)
+
+# Tunables for the SIE-backed tools.
+search:
+  top_k_candidates: 12   # clauses retrieved by embedding similarity
+  top_k_results: 4       # clauses kept after rerank
+
+guard:
+  # granite-guardian emits a "yes" (unsafe) / "no" (safe) verdict. Trip the
+  # guardrail when P(unsafe) clears this threshold. 0.5 is recall-biased.
+  threshold: 0.5
diff --git a/examples/contract-review-agent/contract_review_agent/__init__.py b/examples/contract-review-agent/contract_review_agent/__init__.py
@@ -0,0 +1,6 @@
+"""Contract review with the OpenAI Agents SDK, served entirely by SIE.
+
+One orchestrator agent drives specialist sub-agents and SIE-backed tools, each
+running on a different model from the SIE catalog — the "one cluster powers
+every model your agent calls" story, made runnable.
+"""
diff --git a/examples/contract-review-agent/contract_review_agent/app.py b/examples/contract-review-agent/contract_review_agent/app.py
@@ -0,0 +1,114 @@
+"""Assemble the multi-agent app: an orchestrator on one model, a risk-analyst
+sub-agent on another, SIE-backed tools, a safety guardrail, and a structured
+output type."""
+
+from __future__ import annotations
+
+from typing import Any
+
+from agents import Agent, RunResult, Runner
+from pydantic import BaseModel
+
+from .guardrails import safety_guardrail
+from .runtime import AppContext, model_for
+from .tools import ALL_TOOLS
+
+
+class RiskFlag(BaseModel):
+    clause: str
+    issue: str
+    severity: str  # low | medium | high
+    suggested_redline: str
+
+
+class ContractReview(BaseModel):
+    """The structured deliverable the orchestrator must produce."""
+
+    document_type: str
+    parties: list[str]
+    effective_date: str  # "unknown" if not stated
+    renewal_terms: str
+    governing_law: str  # "unknown" if not stated
+    executed: bool  # is the signature page signed and dated?
+    key_obligations: list[str]
+    risk_flags: list[RiskFlag]
+    recommendation: str
+
+
+# The investigator has NO output_type on purpose: a structured output_type gives a
+# weak model an escape hatch to emit the schema immediately instead of using tools.
+# With only tools available, it must call them to do its job.
+_INVESTIGATOR_INSTRUCTIONS = """\
+You are a contract investigator. You have NO prior knowledge of this contract — the
+ONLY way to learn anything is to CALL YOUR TOOLS. Investigate thoroughly: call EVERY
+one of these tools, one after another, before you write anything.
+
+- classify_document() — the document type
+- ocr_signature_page() — read the executed signature page (signatories, titles, date)
+- extract_entities() — parties, dates, amounts, governing law
+- read_signature_page("Are both parties' signatures present and dated?") — visual execution check
+- search_clauses("automatic renewal"), then search_clauses("limitation of liability"),
+  then search_clauses("indemnification"), then search_clauses("termination")
+- analyze_clause_risks(<the clause text you found>) — risk analysis with severities
+- query_obligations_db("upcoming obligations with due dates and amounts") — deadlines
+
+Do NOT write your report until you have called them all. Then write a thorough,
+factual findings report that cites ONLY what the tools returned. Never invent a party,
+date, number, or clause — if a tool failed, say so."""
+
+_SYNTHESIZER_INSTRUCTIONS = """\
+You turn a contract investigator's findings into a structured ContractReview. Use
+ONLY the findings provided — never add facts. If the findings don't establish a
+field, use "unknown" (or false for `executed`). Make key_obligations and risk_flags
+specific and grounded in the findings, and give a clear recommendation."""
+
+
+def build_reasoning_agent(cfg: dict[str, Any], client: Any) -> Agent:
+    return Agent(
+        name="Risk Analyst",
+        instructions=(
+            "You are a senior contracts attorney. Given contract clauses, identify "
+            "risks to the Customer. For each, state the clause, the issue, a severity "
+            "(low/medium/high), and a concrete one-line redline. Be specific and brief."
+        ),
+        model=model_for(cfg["models"]["reasoning"], client),
+    )
+
+
+def build_investigator(cfg: dict[str, Any], client: Any) -> Agent:
+    """Autonomous tool-using agent (no output_type) that gathers grounded findings."""
+    return Agent(
+        name="Contract Investigator",
+        instructions=_INVESTIGATOR_INSTRUCTIONS,
+        model=model_for(cfg["models"]["orchestrator"], client),
+        tools=ALL_TOOLS,
+        input_guardrails=[safety_guardrail],
+    )
+
+
+def build_synthesizer(cfg: dict[str, Any], client: Any) -> Agent:
+    """Structured-output agent (no tools) that formats the findings into a review."""
+    return Agent(
+        name="Contract Reviewer",
+        instructions=_SYNTHESIZER_INSTRUCTIONS,
+        model=model_for(cfg["models"]["orchestrator"], client),
+        output_type=ContractReview,
+    )
+
+
+async def run_review(
+    app: AppContext, investigator: Agent, synthesizer: Agent, instruction: str
+) -> tuple[RunResult, RunResult]:
+    """Investigate with tools (autonomous fan-out), then synthesize the structured review."""
+    gather = await Runner.run(
+        investigator,
+        f"{instruction}\n\nInvestigate the contract using your tools, then report your findings.",
+        context=app,
+        max_turns=20,
+    )
+    synth = await Runner.run(
+        synthesizer,
+        f"Investigator findings:\n\n{gather.final_output}\n\nProduce the ContractReview.",
+        context=app,
+    )
+    return gather, synth