diff --git a/.coderabbit.yaml b/.coderabbit.yaml new file mode 100644 index 0000000..ba412b2 --- /dev/null +++ b/.coderabbit.yaml @@ -0,0 +1,24 @@ +# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json +language: "en-US" + +reviews: + profile: "chill" + request_changes_workflow: false + high_level_summary: true + review_status: true + path_filters: + - "src/**" + - "tests/**" + path_instructions: + - path: "src/**" + instructions: | + Focus review on correctness, MCP tool behavior, runtime compatibility, + cache/index compatibility, packaging impact, and security boundaries. + Avoid comments about planning docs, release docs, or repository process + unless the changed source code makes those docs materially inaccurate. + - path: "tests/**" + instructions: | + Focus review on meaningful assertions, regression coverage, fixture + correctness, deterministic behavior, and avoiding network- or + environment-dependent tests unless the test is explicitly marked as an + integration smoke test. diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS new file mode 100644 index 0000000..158da79 --- /dev/null +++ b/.github/CODEOWNERS @@ -0,0 +1,49 @@ +# CODEOWNERS — forces maintainer review on forbidden-territory paths. +# +# Source of truth: AGENT-EXECUTION-PIPELINE.md §2 (Forbidden Territory), +# required by §10 (Pre-flight Checklist). +# +# For these rules to be ENFORCED, branch protection on `main` must enable +# "Require review from Code Owners". CODEOWNERS alone only requests review; +# branch protection is what blocks merge. +# +# Autonomous agents may NOT modify these paths without explicit human approval +# (pipeline §2). Any agent PR touching them must add the `🛑 needs-human-review` +# label and stop short of requesting merge (pipeline §7). + +# --- Project identity, dependencies, classifiers (only `version` is agent-editable) --- +/pyproject.toml @ayhammouda + +# --- Permanent commitments and trust posture --- +/LICENSE @ayhammouda +/SECURITY.md @ayhammouda + +# --- Load-bearing brand assets --- +/README.md @ayhammouda +/.planning/POSITIONING.md @ayhammouda + +# --- Release history (adding entries is fine; rewriting history is not) --- +/CHANGELOG.md @ayhammouda + +# --- CI/CD and supply chain (release path especially) --- +# The single /.github/ rule covers workflows and release.yml. Last-matching- +# pattern wins in CODEOWNERS — adding narrower entries with the same owner +# below would be no-ops and would silently *override* this rule if a different +# owner is ever added here, so we keep ownership of /.github/ uniform. +/.github/ @ayhammouda + +# --- Index schema and migrations (rebuilds existing user indexes) --- +# NOTE: the retrieved-docs *cache* table lives in +# src/mcp_server_python_docs/services/persistent_cache.py and is NOT covered +# here — it is best-effort, fingerprint-scoped, and agent-editable per +# decision 5.7. Only the canonical *index* schema is forbidden territory. +**/storage/schema.sql @ayhammouda +**/migrations/ @ayhammouda + +# --- Archival roadmap history --- +/.planning/ROADMAP.md @ayhammouda + +# --- Governing policy + strategy documents --- +/AGENT-EXECUTION-PIPELINE.md @ayhammouda +/OPENCLAW-FORGE-PROTOCOL.md @ayhammouda +/STRATEGIC-ROADMAP-2026-05-29.md @ayhammouda diff --git a/.github/ISSUE_TEMPLATE/autonomous-agent.yml b/.github/ISSUE_TEMPLATE/autonomous-agent.yml new file mode 100644 index 0000000..c0befd6 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/autonomous-agent.yml @@ -0,0 +1,108 @@ +name: Autonomous Agent Task +description: A task spec scoped for unattended execution by an autonomous coding agent (Claude Code or similar). +title: "[vX.Y.Z] " +body: + - type: markdown + attributes: + value: | + This template enforces the issue structure required by + `AGENT-EXECUTION-PIPELINE.md` §3 (in the repo root). An issue missing + any required section is **not** agent-ready and will not pass the §10 + pre-flight checklist. Do not apply the `agent-ready` label from this + template; a maintainer applies it only after reading the completed + issue end-to-end. Read the pipeline doc and + `STRATEGIC-ROADMAP-2026-05-29.md` before filling this out. + - type: textarea + id: context + attributes: + label: Context (self-containment) + description: Link to the per-issue context file, this pipeline doc, the roadmap, any relevant ADR or `.planning/phases/0X-*` directory, and prior related issues. + value: | + - Per-issue context file: `.planning/agent-context/.md` (read this first) + - Pipeline: `AGENT-EXECUTION-PIPELINE.md` + - Roadmap: `STRATEGIC-ROADMAP-2026-05-29.md` §
+ - Related issues: + validations: + required: true + - type: textarea + id: goal + attributes: + label: Goal (one sentence) + description: The single outcome that counts as success. + validations: + required: true + - type: textarea + id: acceptance + attributes: + label: Acceptance criteria (testable checkbox list) + description: Each criterion must be testable, atomic, achievable without touching forbidden territory, and verifiable in <5 minutes (pipeline §4). Prefer exact commands and expected output. + value: | + - [ ] `` + - [ ] `` + validations: + required: true + - type: textarea + id: scope + attributes: + label: Scope boundaries + description: Explicit In scope / Out of scope. Out-of-scope work is a stop-and-comment trigger, never silent expansion. + value: | + **In scope:** + - + + **Out of scope:** + - + validations: + required: true + - type: textarea + id: forbidden + attributes: + label: Forbidden-territory reminders + description: Repeat the AGENT-EXECUTION-PIPELINE.md §2 items relevant to THIS issue. If the task appears to require touching any of them, stop and comment. + validations: + required: true + - type: textarea + id: validation + attributes: + label: Validation commands (pipeline §5 gate) + description: The exact canonical gate, in order, plus any change-type-specific gates. Must pass before any PR is opened. + value: | + ```bash + uv run ruff check src/ tests/ + uv run pyright src/ + uv run pytest --tb=short -q + uv run python-docs-mcp-server doctor + ``` + validations: + required: true + - type: textarea + id: pr-and-recovery + attributes: + label: PR requirements & recovery + description: What the PR description must include (pipeline §6) and where to go if blocked (pipeline §8). + value: | + - PR title matches this issue title verbatim; body uses + `.github/PULL_REQUEST_TEMPLATE/agent.md`. + - Branch: `agent/-`. + - If blocked: stop, write `WORKING-NOTES.md` on the branch, comment on + this issue per pipeline §8. **No PR, no auto-merge, ever.** + validations: + required: true + - type: input + id: effort + attributes: + label: Effort estimate (hours) + description: Rough hours. Agent must bail and escalate if work exceeds 2× this estimate (pipeline §8). + validations: + required: true + - type: checkboxes + id: acknowledgements + attributes: + label: Agent acknowledgements + options: + - label: I will work on a branch, never on `main`, and will not auto-merge. + required: true + - label: I will stop and comment rather than silently expand scope or touch forbidden territory. + required: true + - label: I will add `🛑 needs-human-review` if any pipeline §7 trigger fires. + required: true diff --git a/.github/PULL_REQUEST_TEMPLATE/agent.md b/.github/PULL_REQUEST_TEMPLATE/agent.md new file mode 100644 index 0000000..1dc40cb --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE/agent.md @@ -0,0 +1,42 @@ + + +Closes # + +## Acceptance criteria + +- [ ] +- [ ] + +## Validation gate output + +```text +$ uv run ruff check src/ tests/ +$ uv run pyright src/ +$ uv run pytest --tb=short -q +$ uv run python-docs-mcp-server doctor +``` + + +## CodeRabbit review + +Pending. + +## Why this approach + + +## Why this triggered human review + +None. diff --git a/.planning/agent-context/adr-001-source-adapters.md b/.planning/agent-context/adr-001-source-adapters.md new file mode 100644 index 0000000..bb8beed --- /dev/null +++ b/.planning/agent-context/adr-001-source-adapters.md @@ -0,0 +1,49 @@ +# Agent Context — ADR-001 (Source Adapters) + +> One-read working context for issue `[v0.3.0] docs — write ADR-001 (Source Adapters)`. +> A **writing** task. Every claim must match the code — verify before you assert. + +## 1. Roadmap excerpts (the principles you are recording) + +- **Principle 2.1:** Canonical source only. CPython at a pinned tag for stdlib + docs; PyPI metadata API for package URLs. No scraped mirrors. No third-party indexers. +- **Principle 2.2:** Offline-first *runtime*. No network access at query time. +- **Principle 2.7:** Layered design with stable contracts; the **source + connector** is layer 1 of 8 and is what makes the pattern cloneable. + +## 2. The two source adapters that exist today (describe these) + +1. **CPython documentation source** (`src/mcp_server_python_docs/ingestion/`): + - `cpython_versions.py` — pinned build targets (`CPYTHON_DOCS_BUILD_CONFIG`: + per-version `tag` + `sphinx_pin`). Five versions: 3.10–3.14. + - `__main__.py` `build-index` path — `git clone --depth 1 --branch ` of + `python/cpython`, builds docs with `sphinx-build -b json` in a dedicated venv. + - `sphinx_json.py` — parses the Sphinx JSON output into the index; also loads + `synonyms.yaml`. `inventory.py` — parses `objects.inv` for exact symbol resolution. +2. **PyPI metadata source** (`src/mcp_server_python_docs/services/package_docs.py`): + - Backs `lookup_package_docs`. A **controlled** PyPI metadata lookup + (`GET /pypi//json`) that returns only project/docs/homepage/source + URLs — not a generic web fetch, not scraped docs. + +## 3. The one documented exception to "offline-first" + +- `lookup_package_docs` performs a network call to PyPI's metadata API. This is + **not** a docs-*query*-time call against the canonical stdlib index — it is a + controlled, narrowly-scoped metadata lookup. The ADR must state this exception + explicitly so the offline-first invariant (2.2) stays honest. (See README's + "Why not Context7" section and `SECURITY.md` scope for the existing framing.) + +## 4. Known pitfalls + +- **Verify, don't assume.** Open each cited file and confirm the behavior before + writing it into the ADR. An ADR that misstates current behavior is worse than none. +- Don't document adapters that don't exist (Rust/Go) beyond a single "future + adopters clone this contract" sentence — that's the cloneability point, not a claim. +- No code, schema, or workflow changes — writing only. +- Keep it factual; "reference architecture" is not claimed externally (5.6). + +## 5. Decision log + +- File path: +- Claims you verified against code (file:line): +- Anything ambiguous about the layer contract that you flagged for the maintainer: diff --git a/.planning/agent-context/adr-006-serialization.md b/.planning/agent-context/adr-006-serialization.md new file mode 100644 index 0000000..793521e --- /dev/null +++ b/.planning/agent-context/adr-006-serialization.md @@ -0,0 +1,52 @@ +# Agent Context — ADR-006 (Serialization) + +> One-read working context for issue `[v0.3.0] docs — write ADR-006 (Serialization)`. +> This is a **writing** task. You are recording locked decisions, not making new ones. + +## 1. Roadmap excerpts (the decisions you are recording — verbatim) + +- **Principle 2.5:** Wire format is explicit and pluggable on structured tools + only. Compact JSON default; TOON opt-in *if and only if* the empirical study + supports it. `get_docs` stays markdown. *Token economy is empirical, not architectural.* +- **Principle 2.7:** Layered design with stable contracts — eight layers, the + **serializer** being one of them. +- **Decision 5.3:** Storage stays SQLite + markdown. **TOON-as-storage killed.** +- **Decision 5.4:** Empirical Claude-tokenizer study **gates** the `format="toon"` decision. +- **Decision 5.5:** `format` parameter on `search_docs`, `list_versions`, + `compare_versions` **only**. JSON default; TOON opt-in. `get_docs` stays markdown. +- **Decision 5.8:** The study measures **client-side rewrap**, not just raw + payload tokens; reports tokens AND latency per tool family. + +## 2. Code touch-points (for accuracy — describe, do NOT change) + +- Tool results are Pydantic models in `src/mcp_server_python_docs/models.py` + (e.g. `GetDocsResult`); tools live in `server.py` and return those models, + which FastMCP serializes. The "serializer layer" is the conceptual seam where + a structured result becomes a wire string — that's what the `format` parameter + will eventually parameterize. You are documenting that seam, not building it. +- `get_docs` returns markdown content (`GetDocsResult.content`) — this is why it + is carved out of the `format` parameter (markdown is already the canonical body). + +## 3. Pattern to follow + +- There is no `docs/architecture/` ADR yet — you are establishing the house + style. Use the exact skeleton embedded in the issue. Keep it tight (1–2 pages). +- Number/name the file `docs/architecture/ADR-006-serialization.md` to match the + roadmap's ADR numbering (ADR-001 and ADR-006 are the first two written). + +## 4. Known pitfalls + +- **Do not invent.** If you find yourself making a serialization choice that is + not in §2 above, that's a pipeline §7 trigger ("cites a design choice not in + the issue") — stop and comment. +- **Do not implement `format`.** That is v0.3.x and is gated by the study. +- Don't claim a TOON token win — the study hasn't run. The ADR records that TOON + is *opt-in and gated*, with the bar being "win holds after client rewrap" (5.8). +- "Reference architecture" is **not** claimed externally (decision 5.6) — keep + the ADR factual, not promotional. + +## 5. Decision log + +- Final file path: +- Any wording you were unsure mapped to a locked decision (and how you resolved it): +- Open follow-ups (e.g. link to TOKEN-STUDY.md once it exists): diff --git a/.planning/agent-context/cpython-source-sha-pin.md b/.planning/agent-context/cpython-source-sha-pin.md new file mode 100644 index 0000000..7d431de --- /dev/null +++ b/.planning/agent-context/cpython-source-sha-pin.md @@ -0,0 +1,67 @@ +# Agent Context — CPython source SHA pin + +> One-read working context for issue `[v0.3.0] ingestion — pin CPython source by commit SHA`. +> PARTIAL issue: you do the pin + verification; the human writes the SECURITY.md prose. + +## 1. Roadmap excerpt + +> **Build-time supply-chain hardening** (roadmap §4, v0.3.0): Pin CPython source +> by SHA, not by tag. Document the threat model in SECURITY.md (the `build-index` +> CPython clone is the largest non-runtime attack surface). Verify Sphinx-build +> environment isolation. +> +> **Decision 5.10 (locked):** Build-time supply chain (the `build-index` CPython +> clone) is an explicit risk area; threat model documented in SECURITY.md; +> CPython source pinned by SHA. + +## 2. Code touch-points + +- `src/mcp_server_python_docs/ingestion/cpython_versions.py` + - `CPythonDocsBuildConfig(TypedDict)` — add `sha: str`. + - `CPYTHON_DOCS_BUILD_CONFIG` — five entries, currently `{"tag": ..., "sphinx_pin": ...}`: + `3.10→v3.10.20`, `3.11→v3.11.15`, `3.12→v3.12.13`, `3.13→v3.13.13`, `3.14→v3.14.4`. + Add the resolved SHA to each. Resolve with: + `git ls-remote https://github.com/python/cpython.git refs/tags/` + (use the dereferenced commit — the `^{}` line — not the annotated-tag object). +- `src/mcp_server_python_docs/__main__.py:210–226` — the clone: + `git clone --depth 1 --branch config["tag"] https://github.com/python/cpython.git `. + After it, add: `rev = git -C rev-parse HEAD`; if `rev != config["sha"]`, + log a clear error and **abort this version's build** (raise / skip-with-failure — + match the existing error-handling style in this function; do not silently continue). +- `tests/test_ingestion.py:53` — existing assertion + `config["tag"].startswith(f"v{version}.")`. Add a sibling assertion that + `config["sha"]` matches `^[0-9a-f]{40}$`. + +## 3. Patterns to follow + +- `tests/test_ingestion.py` iterates `CPYTHON_DOCS_BUILD_CONFIG.items()` for the + tag assertion — extend that same loop for the SHA assertion. No new fixtures. +- The clone block already uses `subprocess.run([...], check=True, capture_output=True, text=True)` + — reuse that idiom for the `rev-parse` call. + +## 4. Known pitfalls + +- **`--branch ` cannot take a raw SHA** on a shallow clone against GitHub by + default. Keep the tag-based shallow fetch; make the **SHA a post-clone + verification gate**, not the fetch ref. That is the integrity win: a moved/re-tagged + tag now fails the build instead of silently changing canonical content. +- Use the **dereferenced commit SHA** (peeled tag), not the annotated tag object's + own SHA — `rev-parse HEAD` after checkout gives the commit; match that. +- **Do not edit `SECURITY.md`** (forbidden). Draft the threat-model paragraph in + the PR body + decision log below for a human to paste. +- A full `build-index` clones over the network and takes minutes — do not gate the + PR on it. The unit tests cover the config + verification logic offline. +- Don't bump any tag to a newer CPython point release; pin the SHA of the + **current** tag only. + +## 5. Decision log + +- Resolved SHAs (tag → 40-hex commit), one line each: + - 3.10 / v3.10.20 → + - 3.11 / v3.11.15 → + - 3.12 / v3.12.13 → + - 3.13 / v3.13.13 → + - 3.14 / v3.14.4 → +- Where/how the verification aborts on mismatch: +- **Draft SECURITY.md threat-model paragraph (for human to paste):** + > diff --git a/.planning/agent-context/pyyaml-safe-loader-audit.md b/.planning/agent-context/pyyaml-safe-loader-audit.md new file mode 100644 index 0000000..a410bd1 --- /dev/null +++ b/.planning/agent-context/pyyaml-safe-loader-audit.md @@ -0,0 +1,52 @@ +# Agent Context — PyYAML safe-loader audit + +> One-read working context for issue `[v0.3.0] security — audit and document PyYAML safe-loader discipline`. + +## 1. Roadmap excerpt + +> **PyYAML safe-loader audit** (roadmap §4, v0.3.0): `synonyms.yaml` is loaded at +> startup; confirm only `yaml.safe_load` is used; document the trust boundary. +> +> **Decision 5.11 (locked):** PyYAML safe-loader-only discipline; `synonyms.yaml` +> is the only YAML input and is packaged with the wheel. + +## 2. Code touch-points (already audited for you — verify, then lock in) + +- `src/mcp_server_python_docs/server.py:54–57` — loads `data/synonyms.yaml` via + `importlib.resources` and `yaml.safe_load(path.read_text())`. ✅ safe. +- `src/mcp_server_python_docs/ingestion/sphinx_json.py:595–603` — loads the same + file via `importlib.resources` + `yaml.safe_load`, then type-checks it is a + mapping. ✅ safe. +- `src/mcp_server_python_docs/data/synonyms.yaml` — the only YAML data input; + packaged with the wheel. +- No other `yaml.load(` / `yaml.unsafe_load(` / custom-`Loader=` call sites were + found in `src/`. Your job is to **prove** this with a regression test, not just + assert it. + +## 3. Patterns to follow + +- `tests/test_synonyms.py` already exists — add the discipline test there. +- A clean way to assert the discipline: walk `src/` `.py` files and fail if any + line matches `yaml.load(` or `yaml.unsafe_load(` or `Loader=` (excluding + `SafeLoader`). Keep it simple and fast; no new deps. +- `tests/test_packaging.py` already verifies `synonyms.yaml` ships in the wheel — + reference it; you don't need to duplicate that. + +## 4. Known pitfalls + +- **Do not edit `SECURITY.md`** (forbidden). Capture the trust-boundary write-up + in a new `docs/architecture/YAML-TRUST-BOUNDARY.md` and recommend SECURITY.md + wording for a human. +- The two `safe_load` sites both also exist as `.pyc` in `__pycache__`; grep + source dirs only (`src/`, `tests/`), not `__pycache__`. +- If the codebase is already clean (expected), the deliverable is the **lock-in** + (regression test + doc note), not a code fix. Say so plainly in the PR. +- A literal `yaml.load(` string inside your *test* (as a pattern to search for) + is fine and expected — the test asserts it does not appear in non-test `src/`. + +## 5. Decision log + +- Audit result (clean / findings): +- Regression test name + what it scans: +- Trust-boundary doc location: +- Recommended SECURITY.md wording (for human): diff --git a/.planning/agent-context/readme-glama-six-tool-refresh.md b/.planning/agent-context/readme-glama-six-tool-refresh.md new file mode 100644 index 0000000..488548f --- /dev/null +++ b/.planning/agent-context/readme-glama-six-tool-refresh.md @@ -0,0 +1,54 @@ +# Agent Context — README / glama 6-tool refresh + +> One-read working context for issue `[v0.3.0] docs — refresh public surfaces to the 6-tool surface`. + +## 1. Roadmap excerpt + +> **README + PyPI + glama.json refresh** (roadmap §4, v0.3.0): Reflect the 6-tool +> surface including `compare_versions`. **Decision 5.9:** adopt as a release-cycle +> discipline going forward — every release updates the public-facing tool table. +> Roadmap §3 notes the surface "still lists 5 in some surfaces." + +## 2. Code / file touch-points + +- **Tool order of truth:** `src/mcp_server_python_docs/server.py`, the `@mcp.tool` + declarations in this order: `search_docs` (≈297), `get_docs` (≈318), + `lookup_package_docs` (≈341), `list_versions` (≈358), `detect_python_version` + (≈372), `compare_versions` (≈394). +- `README.md`: + - `## Tools` section at **line ~178**: already a six-row table in the correct + order. Verify, don't churn it. + - **Stale badge near the top:** `MCP%20Registry-v0.1.4`. Current published + registry/PyPI version is **0.2.1**. This is the concrete fix and the only + allowed edit above the first install code block. + - Hero section = everything **above the first install code block** (~line 125) + — FORBIDDEN except the stale registry/version badge above. +- `glama.json`: `description` field (prose, no tool list today). +- `.github/RELEASE.md`: add one checklist line for decision 5.9. + +## 3. Patterns to follow + +- `tests/test_packaging.py` asserts packaging consistency — run it; if you add a + surface, see whether a cheap assertion belongs there (optional, not required). +- Badge lines in `README.md` are markdown image links; match the existing style + when updating the version. + +## 4. Known pitfalls + +- **False positive — do NOT "fix":** `.github/INTEGRATION-TEST.md:139` says + "all five versions". That is the **five Python versions (3.10–3.14)**, not five + tools. Leave it alone. +- **PyPI short description is forbidden.** It comes from `pyproject.toml` + `[project].description`. If it's stale, comment — do not edit `pyproject.toml`. + (The README *body* you edit *is* the PyPI long description on the next release, + which is fine; only the hero is off-limits.) +- `README.md`, `.github/RELEASE.md`, and `glama.json` are all CODEOWNERS-owned. + Your PR will request maintainer review by design — note it, don't fight it. +- Don't bump `server.json` / package versions here; that's release-managed. + +## 5. Decision log + +- Badge: pinned to `0.2.1` vs made version-agnostic — which and why: +- Surfaces audited and their state (README Tools / glama / server.json): +- RELEASE.md checklist line added: +- Anything left for the maintainer (e.g. stale pyproject short description): diff --git a/.planning/agent-context/zstd-cache-codec.md b/.planning/agent-context/zstd-cache-codec.md new file mode 100644 index 0000000..75bdc7b --- /dev/null +++ b/.planning/agent-context/zstd-cache-codec.md @@ -0,0 +1,71 @@ +# Agent Context — zstd cache codec + +> One-read working context for issue `[v0.3.0] cache — add zstd codec layer`. +> Everything you need is here; do not go fishing in `.planning/` archive material. + +## 1. Roadmap excerpt (the goal — do not re-derive) + +> **Workstream J — app-level zstd cache compression** (roadmap §4, v0.3.0): +> Targets the retrieved-docs cache *value column only*. Trained dict on a +> representative `get_docs` corpus. Codec column for forward-compat. Expected +> ratio strong because zstd's dictionary mode is especially effective on small +> correlated records — exactly the cache-entry shape. +> +> **Decision 5.7 (locked):** App-level zstd on retrieved-docs cache, *no gate*. +> Versioned codec column for forward-compat. + +## 2. Code touch-points (file paths + symbols) + +- `src/mcp_server_python_docs/services/persistent_cache.py` + - `retrieved_docs_cache` table — created inline with `CREATE TABLE IF NOT EXISTS` at **line ~47**. Columns today: `index_fingerprint, version, slug, anchor, max_chars, start_index, result_json TEXT, created_at`. PK is the first six columns. + - `put(...)` — **line ~118**, `INSERT OR REPLACE ... result_json` = `result.model_dump_json()`. + - `get(...)` — **line ~80**, `SELECT result_json ...` then `GetDocsResult.model_validate_json(row[0])`. + - This is **best-effort**: every read/write is wrapped in try/except and the cache disables cleanly (`self._conn = None`) on error. Preserve that posture. +- **New module:** `src/mcp_server_python_docs/cache/codec.py` (create the `cache/` package with `__init__.py`). Public API the pipeline §4 example expects: + - `list_supported() -> list[str]` → `['none', 'zstd', 'zstd-dict-v1']` + - `encode(text: str, codec: str, *, dictionary=None) -> bytes`, `decode(blob: bytes, codec: str, *, dictionary=None) -> str`. +- **Tests:** new dir `tests/cache/` with `__init__.py` and `test_codec.py`. + +## 3. Existing test patterns to follow + +- `tests/test_persistent_docs_cache.py` shows the established cache-test idiom: + the `_cache(tmp_path, marker)` helper builds an index fingerprint file + + `PersistentDocsCache`, and `populated_db` (a fixture from `tests/conftest.py`) + provides a live SQLite index. Reuse this shape for the no-regression test. +- The "survives restart" test (`test_cache_survives_restart_and_miss_falls_back`) + is the exact pattern for your "value reads back identically after restart" criterion: + build a `PersistentDocsCache`, write, construct a *second* instance on the same + files, assert `hits == 1` and equality. +- Tests are plain `pytest` functions, `from __future__ import annotations`, type-annotated args. Match that. + +## 4. Known pitfalls + +- **The cache table is NOT `storage/schema.sql`.** `schema.sql` is the canonical + *index* schema and is forbidden territory. The cache table is owned by + `persistent_cache.py` and is yours to evolve per 5.7. Do not confuse them. +- **Existing on-disk caches lack the new column.** `CREATE TABLE IF NOT EXISTS` + will *not* add a column to an existing table. Detect the column + (`PRAGMA table_info(retrieved_docs_cache)`) and `ALTER TABLE ... ADD COLUMN + compression TEXT NOT NULL DEFAULT 'none'` when missing, inside the same + try/except that already tolerates a broken cache. +- **Value column type.** `result_json` is `TEXT`. Compressed output is `bytes`. + Either store the blob in a new `BLOB` column and keep `result_json` for the + `'none'` path, or store all payloads as `BLOB` and record the codec. Simplest + forward-compatible design: keep `result_json` semantics for `'none'`, add a + `result_blob BLOB` for compressed codecs, and let `compression` select which + column to read. Document whichever you choose in the decision log below. +- **`zstandard` must already be importable** (maintainer pre-req). If + `import zstandard` fails, STOP and comment — do not edit `pyproject.toml`/`uv.lock`. +- **`zstd-dict-v1` has no production dictionary in this issue.** Make the codec + *work* only when an explicit dictionary object is supplied by tests. The + cache's default production codec is `'zstd'`. Shipping a trained dictionary + artifact is a separate, human-gated follow-up. +- Decode must dispatch off the stored `compression` value, never off the current + default — otherwise old `'none'` rows break the day the default flips. + +## 5. Decision log (fill this in as you work) + +- Chosen value-storage layout (`result_blob` vs reuse `result_json`): +- How `zstd-dict-v1` round-trips in tests (dict training approach): +- Default production codec wired into `put()`: +- Anything you deferred or escalated: diff --git a/.planning/issues/v0.3.0/00-README.md b/.planning/issues/v0.3.0/00-README.md new file mode 100644 index 0000000..dd5c7ba --- /dev/null +++ b/.planning/issues/v0.3.0/00-README.md @@ -0,0 +1,64 @@ +# v0.3.0 — Agent-Ready Issue Set + +Generated from `STRATEGIC-ROADMAP-2026-05-29.md` §4/§9 and +`AGENT-EXECUTION-PIPELINE.md` §3–§13. Each issue here has a matching per-issue +context file under `.planning/agent-context/.md` (pipeline §12, decision 5.14). + +GitHub issue numbers are filled in below as issues are created (post pre-flight). + +## Wave order (by confidence) + +| # | Issue | Confidence | Slug | GH # | +|---|-------|-----------|------|------| +| 01 | cache — add zstd codec layer | **HIGH** | `zstd-cache-codec` | [#46](https://github.com/ayhammouda/python-docs-mcp-server/issues/46) | +| 02 | docs — refresh public surfaces to 6-tool surface | **HIGH** | `readme-glama-six-tool-refresh` | [#47](https://github.com/ayhammouda/python-docs-mcp-server/issues/47) | +| 03 | security — PyYAML safe-loader audit | MEDIUM | `pyyaml-safe-loader-audit` | [#48](https://github.com/ayhammouda/python-docs-mcp-server/issues/48) | +| 04 | docs — write ADR-006 (Serialization) | MEDIUM | `adr-006-serialization` | [#49](https://github.com/ayhammouda/python-docs-mcp-server/issues/49) | +| 05 | docs — write ADR-001 (Source Adapters) | MEDIUM | `adr-001-source-adapters` | [#50](https://github.com/ayhammouda/python-docs-mcp-server/issues/50) | +| 06 | ingestion — pin CPython source by SHA | PARTIAL | `cpython-source-sha-pin` | [#51](https://github.com/ayhammouda/python-docs-mcp-server/issues/51) | + +> **Live status:** Issues #46–#51 exist on GitHub with topical labels only. +> The `agent-ready` label is **withheld** until you complete the §10 read — +> applying it earlier would falsely signal "passed pre-flight." Apply it per +> issue once you've read it end-to-end. + +**Starter four (obvious overnight wins, de-risk the rest):** 02, 03, 04, 05. +ADR-006 (04) leads the ADR work because it unblocks the v0.3.x `format` parameter. +01 is intentionally delayed until the dependency and dictionary/context API prep +are resolved by a maintainer. 06 trails; it is PARTIAL and **must** carry +`🛑 needs-human-review` because it produces SECURITY.md wording for a human and +touches the supply-chain path. + +## Explicitly NOT in the agent wave (human-led, roadmap §9.1) + +- **30-minute TOON Python port audit** — subjective quality judgment. +- **Empirical token study** — methodology + corpus selection require judgment. + (An agent *may* later scaffold the harness against a human-written + `docs/architecture/TOKEN-STUDY-METHODOLOGY.md`, but that spec doesn't exist yet.) + +## Pre-flight checklist (pipeline §10) — status + +- [x] §9 context files exist on a branch: `AGENTS.md` updated (links pipeline), + `.github/ISSUE_TEMPLATE/autonomous-agent.yml`, `.github/PULL_REQUEST_TEMPLATE/agent.md`, + `.github/CODEOWNERS` created. **(Land these on `main` before queueing.)** +- [ ] §5 canonical gate passes on `main` from a clean clone (maintainer to confirm). +- [ ] Each issue read end-to-end by a human and labeled `agent-ready`. +- [x] `🛑 needs-human-review` and `agent-ready` labels created in the repo. +- [x] CODEOWNERS forces review on `pyproject.toml`, `.github/workflows/`, `LICENSE`, + `README.md`, `.planning/POSITIONING.md`, `schema.sql` (and more — see file). +- [ ] Branch protection on `main` requires ≥1 human approval + "Require review + from Code Owners" (maintainer to confirm in repo settings). +- [x] At least one issue ≤4h for a confidence-building first run: 02 (~1h), 03 (~1–1.5h). + +## Per-issue maintainer pre-reqs + +- **01 (zstd):** add `zstandard>=0.23.0` to `pyproject.toml [project].dependencies` + and run `uv lock` **before** queueing — the agent cannot edit forbidden territory. + +## How these issues were bootstrapped + +Issues #46–#51 were created from the files in this directory with +`gh issue create -F .planning/issues/v0.3.0/.md`, after `agent-ready` and +`🛑 needs-human-review` labels were created in the repo. Do **not** re-run that +loop — it would duplicate live issues. Edit the spec file *and* the GitHub +issue body when a change is needed. diff --git a/.planning/issues/v0.3.0/01-zstd-cache-codec.md b/.planning/issues/v0.3.0/01-zstd-cache-codec.md new file mode 100644 index 0000000..35f07c8 --- /dev/null +++ b/.planning/issues/v0.3.0/01-zstd-cache-codec.md @@ -0,0 +1,83 @@ +# [v0.3.0] cache — add zstd codec layer to the retrieved-docs cache + +> **Confidence:** HIGH · **Wave:** lead · **Slug:** `zstd-cache-codec` +> Create on GitHub with: `gh issue create -F .planning/issues/v0.3.0/01-zstd-cache-codec.md -l area:runtime,priority:P2` +> Branch (after number assigned): `agent/-zstd-cache-codec` + +## ⛔ Blocking pre-requisite (maintainer, before queueing) + +This task needs the `zstandard` runtime dependency, and `pyproject.toml [project]` +is **forbidden territory** (pipeline §2) plus a §7 human-review trigger. The +maintainer must add it and refresh the lockfile **before** this issue is queued: + +```toml +# pyproject.toml [project].dependencies +"zstandard>=0.23.0", +``` +```bash +uv lock +``` + +If `python -c "import zstandard"` fails when the agent starts, the agent **stops +and comments** (pipeline §8) — it must not edit `pyproject.toml` or `uv.lock`. + +## Context + +- **Per-issue context file (read first):** [`.planning/agent-context/zstd-cache-codec.md`](../../agent-context/zstd-cache-codec.md) +- Pipeline: [`AGENT-EXECUTION-PIPELINE.md`](../../../AGENT-EXECUTION-PIPELINE.md) +- Roadmap: [`STRATEGIC-ROADMAP-2026-05-29.md`](../../../STRATEGIC-ROADMAP-2026-05-29.md) §4 (v0.3.0, "Workstream J"), decision **5.7** +- Touch-point: `src/mcp_server_python_docs/services/persistent_cache.py` (the `retrieved_docs_cache` table, `put()`, `get()`) +- New module: `src/mcp_server_python_docs/cache/codec.py` (path chosen to match the pipeline §4 acceptance example) + +## Goal + +Compress the retrieved-docs cache value column with an app-level, versioned zstd +codec that reads pre-existing uncompressed rows transparently. + +## Acceptance criteria + +- [ ] `python -c 'from mcp_server_python_docs.cache.codec import list_supported; print(list_supported())'` prints exactly `['none', 'zstd', 'zstd-dict-v1']`. +- [ ] `uv run pytest tests/cache/test_codec.py -q` passes with **at least 4** new tests covering: round-trip for `'none'`, round-trip for `'zstd'`, round-trip for `'zstd-dict-v1'` using an explicit test-only dictionary object, and graceful decode of a value written under `compression='none'` by a prior server version. +- [ ] The `retrieved_docs_cache` table gains a `compression TEXT NOT NULL DEFAULT 'none'` column, added via `ALTER TABLE ... ADD COLUMN` when an older cache file lacks it (existence-checked), so an existing on-disk cache opens without error and serves its rows. +- [ ] `uv run pytest tests/test_persistent_docs_cache.py -q` still passes (no regression to the existing cache contract), and a new test asserts a value written by the current server reads back identically after a simulated restart with the default production codec. +- [ ] The cache writes new entries with a single configurable default codec (`'zstd'`); `get()` dispatches decode purely off the stored `compression` value, never off the default. + +## Scope boundaries + +**In scope:** +- New `cache/codec.py` with `list_supported()`, `encode(text, codec, *, dictionary=None) -> bytes`, `decode(blob, codec, *, dictionary=None) -> str`, and a registry mapping codec id → handler. +- `compression` column on `retrieved_docs_cache` + transparent migration of existing cache files. +- Wiring `put()`/`get()` in `persistent_cache.py` through the codec. +- Tests under `tests/cache/`. + +**Out of scope (do NOT do these — stop and comment if they seem required):** +- Training and **packaging a production `zstd-dict-v1` dictionary** from a real `get_docs` corpus — corpus selection is a human judgment call per roadmap §4. The `zstd-dict-v1` codec must *function* with an explicit dictionary object supplied by tests, but no production dictionary artifact ships in this issue. +- Any change to the **canonical index** schema (`src/mcp_server_python_docs/storage/schema.sql`). +- Any tool name, parameter, or return shape. +- Compressing `get_docs` markdown on the wire — this is cache-at-rest only. + +## Forbidden-territory reminders (pipeline §2) + +- `pyproject.toml [project]` — the `zstandard` dep is a maintainer pre-req; do not edit. +- `src/**/storage/schema.sql` and migrations — the *index* schema is off-limits. (The *cache* table in `persistent_cache.py` is NOT the index schema and is in scope per decision 5.7.) +- Existing tests — extend, never delete or weaken. + +## Validation commands (pipeline §5) + +Run the canonical four-command gate from `AGENT-EXECUTION-PIPELINE.md` §5, then +the change-specific gate below (the cache lives on the `get_docs` path, so the +wire smoke matters): + +```bash +uv run pytest tests/test_stdio_smoke.py -q +``` + +## PR template & recovery + +- PR body uses `.github/PULL_REQUEST_TEMPLATE/agent.md`; title matches this issue verbatim. +- Adding a third-party runtime dep is a §7 trigger — but if the maintainer pre-added `zstandard`, the PR itself introduces no new dep; state that under "Why this triggered human review: None." +- Blocked? Stop, write `WORKING-NOTES.md`, comment per pipeline §8. No PR, no auto-merge. + +## Effort estimate + +~2–3 hours. diff --git a/.planning/issues/v0.3.0/02-readme-glama-six-tool-refresh.md b/.planning/issues/v0.3.0/02-readme-glama-six-tool-refresh.md new file mode 100644 index 0000000..c7ceed4 --- /dev/null +++ b/.planning/issues/v0.3.0/02-readme-glama-six-tool-refresh.md @@ -0,0 +1,59 @@ +# [v0.3.0] docs — refresh public surfaces to the 6-tool surface + +> **Confidence:** HIGH · **Wave:** lead · **Slug:** `readme-glama-six-tool-refresh` +> Create with: `gh issue create -F .planning/issues/v0.3.0/02-readme-glama-six-tool-refresh.md -l documentation,priority:P2` +> Branch: `agent/-readme-glama-six-tool-refresh` + +## Context + +- **Per-issue context file (read first):** [`.planning/agent-context/readme-glama-six-tool-refresh.md`](../../agent-context/readme-glama-six-tool-refresh.md) +- Pipeline: [`AGENT-EXECUTION-PIPELINE.md`](../../../AGENT-EXECUTION-PIPELINE.md) +- Roadmap: [`STRATEGIC-ROADMAP-2026-05-29.md`](../../../STRATEGIC-ROADMAP-2026-05-29.md) §3, §4 (v0.3.0), decision **5.9** (this becomes a release-cycle discipline) +- Tool registration order of truth: `src/mcp_server_python_docs/server.py` (`@mcp.tool` order) + +## Goal + +Make every public-facing surface consistently describe the six-tool surface +(including `compare_versions`) and codify the refresh as a release-cycle step. + +## Acceptance criteria + +- [ ] `README.md` `## Tools` section lists exactly six rows, including `compare_versions`, in the same order as the `@mcp.tool` declarations in `server.py` (search_docs, get_docs, lookup_package_docs, list_versions, detect_python_version, compare_versions). Verify: the six tool names appear once each in the table. +- [ ] The stale `MCP%20Registry-v0.1.4` badge in `README.md` is updated to the current published registry version (`0.2.1`) **or** made version-agnostic; no badge advertises a version older than the latest release. This is the only allowed edit above the first install code block. +- [ ] `grep -rin 'five tools\|5 tools\|exposes five' README.md glama.json server.json` returns zero hits. +- [ ] `glama.json` `description` is accurate for the current surface and does not contradict the 6-tool README. +- [ ] `.github/RELEASE.md` gains a checklist line establishing decision 5.9: "Refresh README `## Tools`, `glama.json`, and registry/version badges to match the current tool surface." (one line; existing content untouched.) + +## Scope boundaries + +**In scope:** the README registry/version badge, `README.md` body below the hero (the `## Tools` table and prose tool counts), `glama.json` description, and one checklist line in `.github/RELEASE.md`. + +**Out of scope (stop and comment if required):** +- The `README.md` **hero section** (everything above the first install code block) — forbidden territory **except the stale registry/version badge explicitly called out in the acceptance criteria**. +- `pyproject.toml` — the PyPI *short* description and `[project]` metadata derive from here and are forbidden. If the short description is stale, **comment, do not edit**. +- `server.json` `version` / package versions — release-managed, not part of this doc refresh. +- The line `all five versions` in `.github/INTEGRATION-TEST.md` — that refers to the **five Python versions** (3.10–3.14), not five tools. Do **not** "fix" it. + +## Forbidden-territory reminders (pipeline §2) + +- `README.md` hero section — do not touch. +- `pyproject.toml [project]` — do not touch. +- This PR will touch `README.md`, `.github/RELEASE.md`, and `glama.json`, all of which are CODEOWNERS-owned. Expect required maintainer review; that is correct, not a defect. + +## Validation commands (pipeline §5) + +Run the canonical four-command gate from `AGENT-EXECUTION-PIPELINE.md` §5, then +the change-specific gate below (README/metadata consistency): + +```bash +uv run pytest tests/test_packaging.py -q +``` + +## PR template & recovery + +- Use `.github/PULL_REQUEST_TEMPLATE/agent.md`. Under "Why this triggered human review", note: "Touches CODEOWNERS-owned brand/release docs (`README.md`, `.github/RELEASE.md`); opened for review, not auto-merge." +- Blocked? Stop, `WORKING-NOTES.md`, comment per §8. + +## Effort estimate + +~1 hour. diff --git a/.planning/issues/v0.3.0/03-pyyaml-safe-loader-audit.md b/.planning/issues/v0.3.0/03-pyyaml-safe-loader-audit.md new file mode 100644 index 0000000..f0cd5c8 --- /dev/null +++ b/.planning/issues/v0.3.0/03-pyyaml-safe-loader-audit.md @@ -0,0 +1,50 @@ +# [v0.3.0] security — audit and document PyYAML safe-loader discipline + +> **Confidence:** MEDIUM · **Wave:** lead · **Slug:** `pyyaml-safe-loader-audit` +> Create with: `gh issue create -F .planning/issues/v0.3.0/03-pyyaml-safe-loader-audit.md -l compliance,priority:P2` +> Branch: `agent/-pyyaml-safe-loader-audit` + +## Context + +- **Per-issue context file (read first):** [`.planning/agent-context/pyyaml-safe-loader-audit.md`](../../agent-context/pyyaml-safe-loader-audit.md) +- Pipeline: [`AGENT-EXECUTION-PIPELINE.md`](../../../AGENT-EXECUTION-PIPELINE.md) +- Roadmap: [`STRATEGIC-ROADMAP-2026-05-29.md`](../../../STRATEGIC-ROADMAP-2026-05-29.md) §4 (v0.3.0), decision **5.11** +- Known YAML call sites: `src/mcp_server_python_docs/server.py:57`, `src/mcp_server_python_docs/ingestion/sphinx_json.py:597` (both already `yaml.safe_load`); input file `src/mcp_server_python_docs/data/synonyms.yaml` + +## Goal + +Prove and lock in that `synonyms.yaml` is the only YAML input and is loaded only +via `yaml.safe_load`, with the trust boundary documented and regression-guarded. + +## Acceptance criteria + +- [ ] `grep -rn 'yaml.load(' src/` returns **zero** hits. +- [ ] `grep -rn 'yaml.safe_load(' src/` returns at least the two expected call sites (`server.py`, `ingestion/sphinx_json.py`). +- [ ] `grep -rln '\.ya\?ml' src/mcp_server_python_docs/` shows `data/synonyms.yaml` is the only YAML **data input** loaded at runtime/ingestion (any others are config, not parsed input — enumerate them in the PR). +- [ ] A new test `tests/test_synonyms.py::test_yaml_loaded_only_via_safe_load` (or a clearly named addition) asserts the discipline programmatically — e.g. scans `src/` for `yaml.load(` and fails if any unsafe loader appears, and confirms the synonyms loaders use `safe_load`. +- [ ] A short "YAML trust boundary" note is added to the in-repo docs the agent IS allowed to edit (a new `docs/architecture/YAML-TRUST-BOUNDARY.md`, or the per-issue context decision log) stating: synonyms.yaml is packaged with the wheel, is the sole YAML input, and is parsed only with `safe_load`. **Do not edit `SECURITY.md`** (forbidden). + +## Scope boundaries + +**In scope:** read-only audit (grep), a regression test asserting the discipline, and a new architecture note documenting the trust boundary. If a genuinely unsafe `yaml.load(` is found, the fix (switch to `safe_load`) is in scope — but surface the finding in a comment first. + +**Out of scope:** changing `synonyms.yaml` contents or schema; touching ingestion behavior; editing `SECURITY.md`. + +## Forbidden-territory reminders (pipeline §2) + +- `SECURITY.md` — trust-posture prose requires deliberate human review. Capture findings in a new `docs/architecture/` note instead and recommend the `SECURITY.md` wording for a human to apply. +- Existing tests — extend, never weaken. + +## Validation commands (pipeline §5) + +Run the canonical four-command gate from `AGENT-EXECUTION-PIPELINE.md` §5. No +change-type-specific additional gates apply. + +## PR template & recovery + +- Use `.github/PULL_REQUEST_TEMPLATE/agent.md`. If the audit finds the codebase already clean, say so explicitly and ship the regression test + doc note (the value is the lock-in, not a fix). +- Found something genuinely unsafe? That is a security finding — comment on the issue before changing code. + +## Effort estimate + +~1–1.5 hours. diff --git a/.planning/issues/v0.3.0/04-adr-006-serialization.md b/.planning/issues/v0.3.0/04-adr-006-serialization.md new file mode 100644 index 0000000..c897dc9 --- /dev/null +++ b/.planning/issues/v0.3.0/04-adr-006-serialization.md @@ -0,0 +1,91 @@ +# [v0.3.0] docs — write ADR-006 (Serialization) + +> **Confidence:** MEDIUM · **Wave:** lead · **Slug:** `adr-006-serialization` +> Create with: `gh issue create -F .planning/issues/v0.3.0/04-adr-006-serialization.md -l documentation,priority:P2` +> Branch: `agent/-adr-006-serialization` + +## Context + +- **Per-issue context file (read first):** [`.planning/agent-context/adr-006-serialization.md`](../../agent-context/adr-006-serialization.md) +- Pipeline: [`AGENT-EXECUTION-PIPELINE.md`](../../../AGENT-EXECUTION-PIPELINE.md) +- Roadmap: [`STRATEGIC-ROADMAP-2026-05-29.md`](../../../STRATEGIC-ROADMAP-2026-05-29.md) — principle **2.5**, **2.7**; decisions **5.3, 5.4, 5.5, 5.8** +- ADR-006 "specifically enables the v0.3.x `format` parameter work" (roadmap §4). + +## Goal + +Record the already-locked serialization decision as `docs/architecture/ADR-006-serialization.md` so the v0.3.x `format` work has a stable, citable contract. + +## Acceptance criteria + +- [ ] `docs/architecture/ADR-006-serialization.md` exists and fills **every** section of the template embedded below (no placeholder text left). +- [ ] Status is **Accepted** (decisions 5.4/5.5 are already locked) with date `2026-05-29` and decider `@ayhammouda`. +- [ ] The "Decision Outcome" states verbatim the locked shape: compact **JSON is the default**; `format="toon"` is **opt-in and gated by the v0.3.0 empirical study** (5.4); the `format` parameter exists on **`search_docs`, `list_versions`, `compare_versions` only**; **`get_docs` stays markdown** (5.5); **TOON-as-storage is rejected** (5.3). +- [ ] The "Layer Contract" section names the serializer as one of the eight layers (principle 2.7) and states its inputs (structured tool result model), outputs (wire string), and invariant (serialization is a pure function of the result + chosen format; no behavior change for clients that don't opt in). +- [ ] "Considered Options" includes at least: JSON-only, JSON-default-with-TOON-opt-in (chosen), and TOON-as-storage (rejected, ref 5.3); and notes that the win must hold **after client-side rewrap** (5.8), not just on raw payload. +- [ ] `uv run python-docs-mcp-server doctor` passes (this is a docs-only change; no code touched). + +## Scope boundaries + +**In scope:** one new ADR markdown file under `docs/architecture/`. Cross-link from the context file's decision log. + +**Out of scope (stop and comment if it seems required):** +- **Implementing** the `format` parameter — that is v0.3.x, gated by the study. +- Inventing any serialization decision **not already in the roadmap**. The ADR *records* locked decisions; it does not make new ones. (Doing so is a pipeline §7 trigger — "cites a design choice not in the issue.") +- Editing tool signatures, `models.py`, or `server.py`. + +## Forbidden-territory reminders (pipeline §2) + +- No tool name/parameter/return-shape changes — this is a writing task only. +- Do not re-open locked decisions 5.3–5.5; cite them. + +## Validation commands (pipeline §5) + +Run the canonical four-command gate from `AGENT-EXECUTION-PIPELINE.md` §5. No +change-type-specific additional gates apply (this is a docs-only change). + +## ADR template (use exactly this skeleton) + +```markdown +# ADR-006: Serialization & Wire Format + +- **Status:** Accepted +- **Date:** 2026-05-29 +- **Deciders:** @ayhammouda +- **Roadmap refs:** principles 2.5, 2.7; decisions 5.3, 5.4, 5.5, 5.8 + +## Context and Problem Statement + + +## Decision Drivers + + +## Considered Options +1. JSON only. +2. JSON default + `format="toon"` opt-in on structured tools. (chosen) +3. TOON as the storage format. (rejected — decision 5.3) + +## Decision Outcome + + +### Consequences +**Positive:** ... +**Negative / risks:** ... + +## Layer Contract (principle 2.7) +- **Inputs:** ... +- **Outputs:** ... +- **Invariants:** ... + +## Links +- STRATEGIC-ROADMAP-2026-05-29.md §2.5, §5.3–5.5, §5.8 +- (future) v0.3.0 TOKEN-STUDY.md +``` + +## PR template & recovery + +- Use `.github/PULL_REQUEST_TEMPLATE/agent.md`. Under "Why this approach", note the ADR only records roadmap-locked decisions. +- Ambiguity in what's locked? Stop and comment — do not invent. + +## Effort estimate + +~2 hours. diff --git a/.planning/issues/v0.3.0/05-adr-001-source-adapters.md b/.planning/issues/v0.3.0/05-adr-001-source-adapters.md new file mode 100644 index 0000000..96f963c --- /dev/null +++ b/.planning/issues/v0.3.0/05-adr-001-source-adapters.md @@ -0,0 +1,76 @@ +# [v0.3.0] docs — write ADR-001 (Source Adapters) + +> **Confidence:** MEDIUM · **Wave:** trailing · **Slug:** `adr-001-source-adapters` +> Create with: `gh issue create -F .planning/issues/v0.3.0/05-adr-001-source-adapters.md -l documentation,priority:P2` +> Branch: `agent/-adr-001-source-adapters` + +## Context + +- **Per-issue context file (read first):** [`.planning/agent-context/adr-001-source-adapters.md`](../../agent-context/adr-001-source-adapters.md) +- Pipeline: [`AGENT-EXECUTION-PIPELINE.md`](../../../AGENT-EXECUTION-PIPELINE.md) +- Roadmap: [`STRATEGIC-ROADMAP-2026-05-29.md`](../../../STRATEGIC-ROADMAP-2026-05-29.md) — principles **2.1, 2.2, 2.7** +- Source-adapter touch-points (to describe, not change): `ingestion/cpython_versions.py`, `ingestion/sphinx_json.py`, `ingestion/inventory.py`, `services/package_docs.py` + +## Goal + +Record `docs/architecture/ADR-001-source-adapters.md`: the contract for canonical source connectors (CPython docs + PyPI metadata), establishing the layer-contract pattern that makes the architecture cloneable. + +## Acceptance criteria + +- [ ] `docs/architecture/ADR-001-source-adapters.md` exists and fills **every** section of the template embedded below. +- [ ] Status **Accepted**, date `2026-05-29`, decider `@ayhammouda`. +- [ ] The ADR documents the **two source adapters that exist today**: (1) CPython documentation source — cloned at a pinned ref, built via `sphinx-build -b json` (point at `ingestion/`); (2) PyPI metadata source — `lookup_package_docs` controlled metadata fetch (point at `services/package_docs.py`). +- [ ] It states principle **2.1** (canonical source only — no scraped mirrors) and **2.2** (offline-first at *query* time), and explicitly names the **one documented exception**: `lookup_package_docs` performs a controlled PyPI metadata lookup, which is a build/lookup-time network call, not a docs-query-time call. +- [ ] The "Layer Contract" section specifies the source-connector layer's inputs (version/identifier), outputs (canonical artifacts handed to ingestion), and invariants (pinned, reproducible, no third-party indexers), per principle 2.7. +- [ ] `uv run python-docs-mcp-server doctor` passes (docs-only change). + +## Scope boundaries + +**In scope:** one new ADR markdown file under `docs/architecture/`. + +**Out of scope (stop and comment):** changing any ingestion/service code; inventing source-adapter behavior not present in the code or roadmap; documenting adapters that don't exist yet (e.g. Rust/Go) beyond a one-line "future adopters clone this contract" note. + +## Forbidden-territory reminders (pipeline §2) + +- No code changes; no schema; no workflow edits. +- The ADR must describe **current** behavior accurately — verify each claim against the cited files before writing it. + +## Validation commands (pipeline §5) + +Run the canonical four-command gate from `AGENT-EXECUTION-PIPELINE.md` §5. No +change-type-specific additional gates apply (this is a docs-only change). + +## ADR template (use exactly this skeleton) + +```markdown +# ADR-001: Source Adapters + +- **Status:** Accepted +- **Date:** 2026-05-29 +- **Deciders:** @ayhammouda +- **Roadmap refs:** principles 2.1, 2.2, 2.7 + +## Context and Problem Statement +## Decision Drivers +## Considered Options +## Decision Outcome + +### Consequences +**Positive:** ... +**Negative / risks:** ... +## Layer Contract (principle 2.7) +- **Inputs:** ... +- **Outputs:** ... +- **Invariants:** ... +## Links +- STRATEGIC-ROADMAP-2026-05-29.md §2.1, §2.2, §2.7 +``` + +## PR template & recovery + +- Use `.github/PULL_REQUEST_TEMPLATE/agent.md`. Verify claims against the code before asserting them; cite file paths in the ADR. + +## Effort estimate + +~2 hours. diff --git a/.planning/issues/v0.3.0/06-cpython-source-sha-pin.md b/.planning/issues/v0.3.0/06-cpython-source-sha-pin.md new file mode 100644 index 0000000..a74ff87 --- /dev/null +++ b/.planning/issues/v0.3.0/06-cpython-source-sha-pin.md @@ -0,0 +1,61 @@ +# [v0.3.0] ingestion — pin CPython source by commit SHA + +> **Confidence:** PARTIAL (agent does the pin; human writes the SECURITY.md threat model) · **Wave:** trailing · **Slug:** `cpython-source-sha-pin` +> Create with: `gh issue create -F .planning/issues/v0.3.0/06-cpython-source-sha-pin.md -l area:build,compliance,priority:P1` +> Branch: `agent/-cpython-source-sha-pin` + +## Context + +- **Per-issue context file (read first):** [`.planning/agent-context/cpython-source-sha-pin.md`](../../agent-context/cpython-source-sha-pin.md) +- Pipeline: [`AGENT-EXECUTION-PIPELINE.md`](../../../AGENT-EXECUTION-PIPELINE.md) +- Roadmap: [`STRATEGIC-ROADMAP-2026-05-29.md`](../../../STRATEGIC-ROADMAP-2026-05-29.md) §4 (v0.3.0, build-time supply-chain hardening), decision **5.10** +- Touch-points: `ingestion/cpython_versions.py` (`CPythonDocsBuildConfig`, `CPYTHON_DOCS_BUILD_CONFIG`), `__main__.py:210–226` (the `git clone --depth 1 --branch ` call), `tests/test_ingestion.py:53` + +## Goal + +Make a pinned commit SHA — not a mutable tag — the integrity anchor for every CPython docs build, so a re-tagged or moved tag fails the build instead of silently changing canonical content. + +## Acceptance criteria + +- [ ] `CPythonDocsBuildConfig` gains a `sha: str` field; each of the five entries in `CPYTHON_DOCS_BUILD_CONFIG` carries the 40-char lowercase-hex commit SHA that the existing `tag` currently resolves to (resolve via `git ls-remote https://github.com/python/cpython.git `). The `tag` field is **kept** for human readability with a comment noting the SHA is authoritative. +- [ ] After the clone in `__main__.py`, the code verifies `git -C rev-parse HEAD` equals `config["sha"]` and **aborts that version's build with a clear error** on mismatch (no silent fallback). The shallow `--branch ` fetch may stay; the SHA check is what enforces integrity. +- [ ] `tests/test_ingestion.py` asserts every config entry has a `sha` matching `^[0-9a-f]{40}$`, alongside the existing tag assertion at line 53. +- [ ] `uv run pytest tests/test_ingestion.py -q` passes. +- [ ] A draft SECURITY.md threat-model paragraph (the `build-index` CPython clone as the largest non-runtime attack surface, now SHA-pinned) is written **into the PR description and the context file's decision log** for a human to paste — `SECURITY.md` itself is **not** edited. + +## Scope boundaries + +**In scope:** `ingestion/cpython_versions.py`, the SHA-verification step in `__main__.py`, and `tests/test_ingestion.py`. + +**Out of scope (stop and comment):** +- Editing `SECURITY.md` (forbidden — draft the wording only). +- Changing the clone transport, the Sphinx pin, or any build behavior beyond adding the SHA verification. +- Bumping any tag/version to a newer CPython release — pin the SHA of the **current** tag only. + +## Forbidden-territory reminders (pipeline §2) + +- `SECURITY.md` — do not edit; provide draft text for human review (this is the "human" half of this PARTIAL issue). +- `.github/workflows/` — do not touch the release/CI path. +- `pyproject.toml [project]` — untouched. + +## Validation commands (pipeline §5) + +Run the canonical four-command gate from `AGENT-EXECUTION-PIPELINE.md` §5, then +the change-specific gate below (ingestion-touching change): + +```bash +uv run python-docs-mcp-server validate-corpus +``` + +> Note: a full `build-index` clones CPython over the network and takes minutes; +> the unit tests in `tests/test_ingestion.py` cover the config/verification logic +> without a live clone. Do not gate the PR on a full multi-version build. + +## PR template & recovery (pipeline §6, §7) + +- This is a **human-review-required** PR: it touches the supply-chain integrity path and produces SECURITY.md wording for a human. Open the PR, add `🛑 needs-human-review`, do **not** request merge. Fill the "Why this triggered human review" section. +- Blocked (e.g. can't resolve a SHA offline)? Stop and comment per §8. + +## Effort estimate + +~2 hours. diff --git a/AGENT-EXECUTION-PIPELINE.md b/AGENT-EXECUTION-PIPELINE.md new file mode 100644 index 0000000..17b7b43 --- /dev/null +++ b/AGENT-EXECUTION-PIPELINE.md @@ -0,0 +1,287 @@ +# Autonomous Agent Execution Pipeline + +**Purpose:** Define the policy, context, and guardrails for running autonomous coding agents (Claude Code or similar) against this project's GitHub issues while the maintainer is AFK. + +**Companion to:** [`STRATEGIC-ROADMAP-2026-05-29.md`](STRATEGIC-ROADMAP-2026-05-29.md) (the *what*; this is the *how, with what guardrails*). + +**OpenClaw operating layer:** [`OPENCLAW-FORGE-PROTOCOL.md`](OPENCLAW-FORGE-PROTOCOL.md) defines how Vision, Gilfoyle, and Heimdall apply this policy. This project has no UI, so Saga is not part of the default loop. + +**Adopted:** 2026-05-29 + +--- + +## 1. Operating Principles + +- Agents work in branches, never on `main`. +- Every PR requires human review before merge. **No auto-merge, ever.** +- Agents declare their scope explicitly and stay inside it. +- The canonical validation gate (§5) must pass before any PR is opened. Failing gate → no PR, just a `WORKING-NOTES.md` on the branch + comment on the issue. +- Automated review tools such as CodeRabbit provide review signal only. They do not approve, merge, or override the human-review gate. +- Forbidden territory (§2) is non-negotiable. Any drift triggers a hard stop. +- Recovery is always **stop and post a comment**, never **silently expand scope**. + +The goal is to maximize what an agent can do unattended overnight, then catch anything that needed human judgment in a tight morning review. + +--- + +## 2. Forbidden Territory (hard stop) + +Autonomous agents may NOT modify the following without explicit human approval in the issue comments first: + +| Path / Concern | Reason | +|---|---| +| Any tool name, parameter, or return shape | Public API surface; semver-significant | +| `schema.sql` and migrations | Index schema; rebuilds existing user caches | +| `.github/workflows/` (any workflow) | CI/CD and supply chain | +| `.github/workflows/release.yml` specifically | Release path; Trusted Publishing config | +| `pyproject.toml` `[project]` (anything other than `version`) | Identity, dependencies, classifiers | +| Major dependency bumps (anything ≥1 major) | Compatibility risk | +| `.planning/POSITIONING.md` | Load-bearing brand asset | +| `README.md` hero section (above the first install code block) | Load-bearing brand asset | +| `LICENSE` | Permanent commitment (MIT, always free) | +| `CHANGELOG.md` (creating entries is fine; rewriting history is not) | Release history | +| `SECURITY.md` | Trust posture; requires deliberate review | +| Existing tests (deletion or weakening assertions) | Regression cover | +| `.planning/ROADMAP.md` historical phase records | Archival history | +| `AGENT-EXECUTION-PIPELINE.md`, `OPENCLAW-FORGE-PROTOCOL.md`, and `STRATEGIC-ROADMAP-2026-05-29.md` | Governing policy and strategy docs | + +If an agent's task appears to require touching any of these: +1. **Stop work.** +2. Post a comment on the issue explaining the conflict. +3. Tag with `🛑 needs-human-review`. +4. Wait for guidance. + +--- + +## 3. Issue Structure (required for every agent-targetable issue) + +Every issue intended for an autonomous agent **must** contain these sections, in this order: + +| Section | Purpose | Required content | +|---|---|---| +| **Title** | Routability | `[v0.X.Y] ` e.g., `[v0.3.0] cache — add zstd codec layer` | +| **Context** | Self-containment | Links to: this pipeline doc, the strategic roadmap, any specific ADR or `.planning/phases/0X-*` directory, prior related issues | +| **Goal** | Single sentence | What outcome counts as success | +| **Acceptance criteria** | Testable definition of done | Checkbox list per §4 | +| **Scope boundaries** | Prevents creep | "In scope:" and "Out of scope:" subsections | +| **Forbidden-territory reminder** | Belt and suspenders | Repeat the §2 items relevant to this issue | +| **Validation commands** | Pre-PR gate | The exact canonical commands per §5 | +| **PR template** | What the PR description must include | §6 checklist | +| **Recovery** | What to do if blocked | Pointer to §8 | +| **Effort estimate** | Sanity check | Rough hours; agent should bail and escalate if work exceeds 2× | + +An issue missing any of these is not agent-ready. The pre-flight checklist (§10) gates this. + +--- + +## 4. Acceptance-Criteria Patterns + +**Each criterion must be:** testable, atomic, achievable without touching forbidden territory, and sized so a competent dev could verify it in <5 minutes. + +**Good examples:** + +- "`uv run pytest tests/cache/test_codec.py -q` passes with at least 4 new tests covering: codec round-trip for `'none'`, codec round-trip for `'zstd'`, codec round-trip for `'zstd-dict-v1'`, and graceful read of pre-existing `compression='none'` rows." +- "After cherry-picking this branch, `python -c 'from mcp_server_python_docs.cache.codec import list_supported; print(list_supported())'` prints exactly `['none', 'zstd', 'zstd-dict-v1']`." +- "`grep -rn 'yaml.load(' src/ tests/` returns zero hits. `grep -rn 'yaml.safe_load(' src/` returns at least the expected call sites in `server.py` and `ingestion/sphinx_json.py`." +- "README.md `## Tools` section lists exactly six rows in the table, including `compare_versions`, and the row order matches the `@mcp.tool` declaration order in `src/mcp_server_python_docs/server.py`." + +**Bad examples (do not allow):** + +- "Improve cache performance." — not testable +- "Make it production-ready." — not specific +- "Refactor for clarity." — invites scope creep +- "Add tests." — what tests, asserting what? + +--- + +## 5. Canonical Validation Gate + +**Must pass, in this order, before any PR is opened:** + +```bash +uv run ruff check src/ tests/ +uv run pyright src/ +uv run pytest --tb=short -q +uv run python-docs-mcp-server doctor +``` + +**Additional gates for specific change types:** + +- Any change touching the MCP wire protocol or tool registration: + ```bash + uv run pytest tests/test_stdio_smoke.py -q + ``` +- Any change to ingestion or storage: + ```bash + uv run python-docs-mcp-server validate-corpus + ``` +- Any change touching dependencies: + ```bash + uv lock --check + uv pip compile --quiet pyproject.toml -o /tmp/requirements-check.txt + ``` + +**Failure handling:** If any gate fails, the agent writes the full output into a file `WORKING-NOTES.md` at the branch root, commits it as `agent: validation-gate-failed`, posts a comment on the issue with a link to the failing commit, and stops. **No PR is opened.** + +--- + +## 6. Branch, Commit, PR Conventions + +- **Branch name:** `agent/{issue-number}-{kebab-case-summary}` (e.g., `agent/47-zstd-cache-codec`) +- **Commit prefix:** `agent: ` followed by a short, imperative summary. Conventional-commit scopes optional but encouraged: `agent: cache(codec): add zstd round-trip path` +- **Atomic commits.** One logical change per commit. No squash-and-force-push during agent work. +- **PR title** matches the issue title verbatim +- **PR description** must include: + - `Closes #` (or `Refs #` if intentionally not closing) + - Each acceptance criterion as a checked or unchecked box, with a one-line explanation if unchecked + - Output (or link to artifact) for the §5 validation gate + - CodeRabbit triage summary when CodeRabbit comments on the PR: blocking, follow-up, false positive, or pending/unavailable + - A short "Why this approach" paragraph if the design wasn't fully prescribed in the issue + - The §7 "Why this triggered human review" disclosure (which doubles as a forbidden-territory near-miss log when applicable; CODEOWNERS is the mechanical enforcement) +- **PR is opened against** the milestone integration branch (e.g., `release/v0.3.0`) when one exists, otherwise `main`. Never auto-merge. + +--- + +## 7. Human-Review Triggers (always pause) + +The agent must open the PR but **NOT** request merge — and must add the `🛑 needs-human-review` label — if any of these are true: + +| Trigger | Why | +|---|---| +| Any forbidden-territory item (§2) was modified | By definition | +| Any existing test was deleted | Possible regression-cover loss | +| Diff exceeds 500 lines of source code (excluding generated and lockfiles) | Bigger than a single agent task should be | +| A new third-party runtime dependency was introduced | Trust-posture and footprint review | +| Any `pyproject.toml` field changed (other than `version` during a release issue) | Identity / metadata review | +| `.github/workflows/` was modified | CI/release-path review | +| The PR introduces network access at runtime | Violates principle 2.2 (offline-first) | +| The PR introduces async code in a previously-sync code path | Concurrency review | +| The agent's "Why this approach" paragraph cites a design choice not in the issue | Verify scope | + +For each trigger, the PR description must include a `## Why this triggered human review` section explaining what changed and why the agent believes it was necessary. + +--- + +## 8. Recovery Procedures + +If the agent encounters any of these conditions, **stop work** and post a comment on the issue: + +- A previously passing test now fails for an unclear reason +- A change to forbidden territory appears necessary to complete the task +- The acceptance criteria turn out to be ambiguous or contradictory +- An upstream dependency is broken or unavailable +- Work appears to exceed 2× the original effort estimate + +The stop comment must contain: +1. What was attempted (1–3 sentences) +2. What failed or blocked (with error output if applicable) +3. The agent's best read on the path forward +4. An explicit "I am stopping pending guidance" line + +**Forbidden recovery moves:** + +- Silently expanding scope +- Trying alternative implementations not specified in the issue +- Merging to `main` +- Deleting tests to make others pass +- Suppressing warnings or skipping tests as a "workaround" + +--- + +## 9. Context Files Required Before Agents Run + +These files must exist on `main` before the v0.3.0 issues are unleashed to autonomous agents. + +| File | Purpose | Status | +|---|---|---| +| [`AGENTS.md`](AGENTS.md) | Existing repo-conventions doc; should reference this pipeline | **Needs update** — add a one-paragraph link to this file | +| [`STRATEGIC-ROADMAP-2026-05-29.md`](STRATEGIC-ROADMAP-2026-05-29.md) | The *what and why*; mandatory reading | **Exists** | +| `AGENT-EXECUTION-PIPELINE.md` (this file) | The *how, with what guardrails* | **Exists** | +| [`OPENCLAW-FORGE-PROTOCOL.md`](OPENCLAW-FORGE-PROTOCOL.md) | OpenClaw role split and MCP-specific execution loop | **Exists** | +| `.github/ISSUE_TEMPLATE/autonomous-agent.yml` | Issue template enforcing §3 structure | **Create** — see §11 sketch | +| `.github/PULL_REQUEST_TEMPLATE/agent.md` | PR template enforcing §6 | **Create** — see §11 sketch | +| `.github/CODEOWNERS` | Forces human review on forbidden-territory paths | **Create** — see §11 sketch | +| `docs/architecture/TOKEN-STUDY-METHODOLOGY.md` | Methodology spec for the v0.3.0 first issue | **Create as part of that issue spec** | +| GitHub label: `🛑 needs-human-review` | Marks PRs paused at §7 triggers | **Create** | +| GitHub label: `agent-ready` | Confirms issue passed §10 pre-flight | **Create** | +| Branch protection on `main` | Requires at least one human approval before merge | **Confirm enabled** | +| Branch protection on `release/v0.3.0` (when created) | Same | **Configure at branch creation** | + +--- + +## 10. Pre-flight Checklist (run before unleashing agents on a milestone) + +Run this checklist before pushing the first agent-ready issue to the queue. + +- [ ] All §9 context files exist on `main`. +- [ ] The §5 canonical validation gate passes on `main` (clean baseline). +- [ ] Each issue has been read end-to-end by a human and labeled `agent-ready`. +- [ ] Each issue includes its §3 sections in full. +- [ ] The `🛑 needs-human-review` and `agent-ready` labels exist in the repo. +- [ ] CODEOWNERS forces review on at least: `pyproject.toml`, `.github/workflows/`, `LICENSE`, `README.md`, `.planning/POSITIONING.md`, `schema.sql`. +- [ ] Branch protection on `main` requires ≥1 human approval before merge. +- [ ] At least one issue is small enough (≤4 hours) to serve as a confidence-building first run. + +--- + +## 11. Templates (authoritative implementations) + +The §9 templates live as checked-in files; this section just points at them so +this doc never drifts from the implementations: + +- `.github/ISSUE_TEMPLATE/autonomous-agent.yml` +- `.github/PULL_REQUEST_TEMPLATE/agent.md` +- `.github/CODEOWNERS` + +--- + +## 12. Per-Issue Context Files (v0.3.0 wave) + +For the first wave of agent-targetable issues, Claude Code should include — as linked references in each issue — a dedicated `.planning/agent-context/.md` file that captures the agent's working notes, design decisions to honor, and any test fixtures or code excerpts the agent will need to look at. + +Each per-issue context file should contain: + +1. **The relevant excerpt from `STRATEGIC-ROADMAP-2026-05-29.md`** (don't make the agent re-derive the goal). +2. **Pointer to the existing code touch-points** (file paths + symbols). +3. **Existing test patterns to follow** (one or two example tests from the same area). +4. **Known pitfalls** specific to this task. +5. **Decision log placeholder** for the agent to fill in. + +The point is to give the agent everything it needs in one read, so it doesn't go fishing across the repo and pick up incorrect patterns from `.planning/` archive material. + +--- + +## 13. Suggested Agent-Targetable Issues for v0.3.0 + +Mapping the v0.3.0 deliverables to agent-friendliness, to help prioritize issue generation: + +| Deliverable | Agent-friendly? | Why | Recommended owner | +|---|---|---|---| +| Workstream J — zstd cache codec layer | **Yes (high)** | Well-bounded, testable, no API surface change | Agent | +| README + PyPI + glama.json refresh to 6-tool surface | **Yes (high)** | Mechanical, easily verified, low risk | Agent | +| PyYAML safe-loader audit | **Yes (medium)** | Simple grep + fix; need agent to surface findings before changing | Agent | +| ADR-001 (Source Adapters) draft | **Yes (medium)** | Writing task; needs clear template + style guide | Agent with strict template | +| ADR-006 (Serialization) draft | **Yes (medium)** | Same as ADR-001 | Agent with strict template | +| Build-time supply-chain hardening (CPython SHA pin + SECURITY.md update) | **Partial** | Pinning is mechanical; SECURITY.md text needs judgment | Agent for pinning; human for SECURITY.md | +| 30-minute TOON Python port audit | **No** | Requires subjective quality judgment | Human | +| Empirical token study | **No** | Methodology choices and corpus selection require judgment | Human (with agent scaffolding the harness) | + +The v0.3.0 issue wave should therefore lead with the **high-confidence agent issues** so the overnight run produces obvious wins, then escalate to the partial / human-judgment items the following day with the maintainer at the keyboard. + +--- + +## Amendments + +*Append amendments below as `## Amendment YYYY-MM-DD` sections. Do not edit historical content above this line; the locked sections are the authoritative current policy.* + +## Amendment 2026-05-29 — OpenClaw Role Split + +OpenClaw execution for this repo is governed by `OPENCLAW-FORGE-PROTOCOL.md`. +The default loop is Vision → Gilfoyle → Heimdall → Vision/Aymen: + +- Vision owns issue pre-flight, `agent-ready`, review synthesis, branch protection, and pause/resume decisions. +- Gilfoyle owns scoped implementation on exactly one issue at a time. +- Heimdall owns independent verification, packaging/install smoke, security-sensitive checks, and release-readiness checks. +- CodeRabbit findings are mandatory review signal when present. Vision/Heimdall must triage them as blocking, follow-up, or false positive before `verified`. +- Saga is not in the default loop because this MCP has no UI. +- Pipeline Monitor remains disabled unless Aymen explicitly asks for assisted merge checks; no auto-merge is allowed. diff --git a/AGENTS.md b/AGENTS.md index 931782c..b25eb92 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -11,6 +11,19 @@ The repo's public credibility matters. Prefer changes that make the project easier to trust, easier to verify, and easier to contribute to over changes that merely add more AI or MCP setup. +## Autonomous Agent Execution + +A portion of this project is executed by autonomous coding agents working +unattended against GitHub issues. If you are such an agent — or are scoping +work for one — **`AGENT-EXECUTION-PIPELINE.md` is mandatory reading before you +touch anything.** It defines the forbidden territory (don't-touch paths), the +required issue structure, the canonical validation gate, the human-review +triggers, and the recovery procedure. The *what and why* lives in +`STRATEGIC-ROADMAP-2026-05-29.md`; the *how, with what guardrails* lives in the +pipeline doc. Agents work on branches only, every PR needs human review, and +auto-merge is forbidden. Per-issue working context lives under +`.planning/agent-context/.md`. + ## Canonical Commands If `uv` is not installed yet: @@ -87,6 +100,9 @@ Before calling work complete: - `.planning/ROADMAP.md` and `.planning/phases/0X-…/0X-CONTEXT.md` are live, forward-looking specs — read these first when starting a new phase. +- `.planning/agent-context/.md` and active + `.planning/issues/v0.3.0/*.md` specs are live only for autonomous-agent issue + execution; read them when an issue or pipeline document links them. - Anything else in `.planning/` (especially content dated 2026-04 or earlier) is archival history. It may help maintainers reconstruct prior context but should not drive routine implementation decisions. diff --git a/OPENCLAW-FORGE-PROTOCOL.md b/OPENCLAW-FORGE-PROTOCOL.md new file mode 100644 index 0000000..f36ff43 --- /dev/null +++ b/OPENCLAW-FORGE-PROTOCOL.md @@ -0,0 +1,294 @@ +# OpenClaw Forge Protocol — python-docs-mcp-server + +**Adopted:** 2026-05-29 +**Status:** Active once merged with `AGENT-EXECUTION-PIPELINE.md` +**Scope:** OpenClaw orchestration for autonomous work on `ayhammouda/python-docs-mcp-server` + +This document defines how OpenClaw agents execute the roadmap for this MCP server. +The repo has no product UI, so the old e-commerce forge shape does not apply: +there is no visual QA lane, no Vercel preview lane, and no design approval gate. + +The core loop is: + +- **Vision** plans, gates, reviews, and protects the repo. +- **Gilfoyle** implements one scoped issue at a time. +- **Heimdall** verifies behavior, packaging, security posture, and release readiness. +- **CodeRabbit** provides automated review signal that Heimdall and Vision must triage. +- **Aymen** remains the final human review authority for protected merges. + +`AGENT-EXECUTION-PIPELINE.md` remains the binding repo policy. This protocol is +the OpenClaw operating layer for applying that policy. + +--- + +## 1. Role Map + +| Role | Agent | Responsibility | May modify code? | May merge? | +|---|---|---|---|---| +| Supervisor | Vision (`main`) | Issue pre-flight, labels, branch protection, final review synthesis, stuck-work decisions | Yes, for protocol/config/documentation fixes | No auto-merge | +| Implementer | Gilfoyle (`arch`) | Implement exactly one `agent-ready` issue, open/update one PR, run the canonical gate | Yes | No | +| Verifier | Heimdall (`test`) | Independently validate PR behavior, test evidence, packaging/install smoke, security/release risks | Only test artifacts or diagnostic notes when explicitly assigned | No | +| Automated reviewer | CodeRabbit | Static review comments, maintainability findings, and security-adjacent review signal | No | No | +| Designer | Saga (`design`) | Not in the default loop; no UI exists | No | No | +| Merger | Pipeline Monitor (`merge`) | Disabled for this repo unless Aymen explicitly asks for assisted merge checks | No | No auto-merge | + +No agent may claim to be Vision, Aymen, or a maintainer. Agent comments must use +their own role name and must not invoke supervisor override language. + +--- + +## 2. Flow + +```mermaid +flowchart TD + A[Vision reviews roadmap + issue spec] --> B{Issue passes pre-flight?} + B -- no --> C[Vision fixes spec or labels needs-human-review] + B -- yes --> D[Vision applies agent-ready] + D --> E[Gilfoyle creates agent issue branch] + E --> F[Gilfoyle implements within scope] + F --> G{Canonical gate green?} + G -- no --> H[Commit WORKING-NOTES.md + stop] + G -- yes --> I[Gilfoyle opens PR] + I --> R[CodeRabbit automated review] + I --> J[Heimdall independent verification] + R --> S[Vision/Heimdall triage findings] + J --> K{Verifier + review triage pass?} + S --> K + K -- no --> L[Heimdall or Vision labels verification-failed and comments exact failures] + L --> E + K -- yes --> M[Heimdall labels verified] + M --> N[Vision review synthesis] + N --> O{Human approval?} + O -- no --> P[Changes requested or needs-human-review] + O -- yes --> Q[Aymen/Vision merges manually after protected checks] +``` + +The flow is deliberately slower than the Alto pipeline. This project is a public +developer tool with a small API surface; one bad auto-merge damages trust faster +than it saves time. + +--- + +## 3. Labels + +The repo should use these labels for the OpenClaw loop: + +| Label | Set by | Meaning | +|---|---|---| +| `agent-ready` | Vision only | Issue passed pre-flight and may be picked up by Gilfoyle | +| `agent-in-progress` | Gilfoyle | Gilfoyle has claimed the issue | +| `agent-pr-opened` | Gilfoyle | Implementation PR exists | +| `verification-needed` | Gilfoyle | PR is ready for Heimdall | +| `verified` | Heimdall | Independent verification passed | +| `verification-failed` | Heimdall | Verification failed; comment contains exact reproduction | +| `🛑 needs-human-review` | Any agent | Human judgment required before further automation | + +Only one of `verification-needed`, `verified`, and `verification-failed` should +be present on a PR at a time. + +--- + +## 4. Vision Protocol + +Vision owns the queue. + +Before labeling an issue `agent-ready`, Vision must verify: + +- The issue has every required section from `AGENT-EXECUTION-PIPELINE.md` §3. +- The issue links its `.planning/agent-context/.md` file. +- The issue has clear in-scope and out-of-scope boundaries. +- The acceptance criteria are executable in under five minutes each. +- The canonical validation gate is green on current `main`. +- `main` branch protection requires one approving review and Code Owner review. +- The issue does not require spending money, external communication, secret + rotation, architecture policy changes, or public API design judgment. + +Vision also owns PR review synthesis: + +- Check the PR diff against forbidden territory. +- Compare Heimdall's verification comment with Gilfoyle's claimed evidence. +- Read CodeRabbit findings and classify each as blocking, non-blocking follow-up, + or false positive. +- Decide whether to request changes, add `🛑 needs-human-review`, or approve + for Aymen's final merge. + +Vision may directly patch planning/protocol files when the gap is in the forge +itself, but feature implementation should normally go through Gilfoyle. + +--- + +## 5. Gilfoyle Protocol + +Gilfoyle owns implementation. + +Per cycle, Gilfoyle must: + +1. Pick exactly one open issue labeled `agent-ready` and not labeled + `agent-in-progress`. +2. Add `agent-in-progress` to the issue. +3. Create branch `agent/-`. +4. Read only: + - `AGENTS.md` + - `AGENT-EXECUTION-PIPELINE.md` + - this protocol + - the linked per-issue context file + - directly relevant source/tests +5. Implement only the scoped change. +6. Run the canonical gate: + ```bash + uv run ruff check src/ tests/ + uv run pyright src/ + uv run pytest --tb=short -q + uv run python-docs-mcp-server doctor + ``` +7. Open a PR only if the gate is green. +8. Add `agent-pr-opened` and `verification-needed`. + +Gilfoyle must stop and comment if: + +- Any forbidden-territory path appears necessary. +- Tests fail for unclear reasons. +- The issue spec contradicts repo reality. +- The diff exceeds the issue's expected size by more than 2x. +- A runtime dependency or public tool contract change is needed. + +Gilfoyle must not merge, approve, dismiss reviews, or add `verified`. + +--- + +## 6. Heimdall Protocol + +Heimdall owns verification, not UI testing. + +For each PR labeled `verification-needed`, Heimdall must independently run: + +```bash +uv run ruff check src/ tests/ +uv run pyright src/ +uv run pytest --tb=short -q +uv run python-docs-mcp-server doctor +``` + +Then add targeted checks based on touched files: + +| Change type | Additional verification | +|---|---| +| MCP tool registration or protocol behavior | `uv run pytest tests/test_stdio_smoke.py -q` | +| Packaging / metadata / README / Glama | Build wheel/sdist locally and inspect package metadata | +| Cache/storage behavior | Run focused cache/storage tests and verify existing cache compatibility | +| Ingestion/version code | Run focused ingestion/version tests and, when feasible, `validate-corpus` | +| Security-sensitive parsing | Grep for unsafe APIs and confirm trust boundary documentation | +| ADR/docs-only PR | Verify links, file paths, command references, and forbidden-territory claims | + +Heimdall must also read CodeRabbit's review before applying `verified`. +CodeRabbit is not authoritative, but unresolved blocking findings must prevent +`verified`. + +Heimdall comments with: + +- Commit SHA verified. +- Exact commands run. +- Pass/fail result. +- CodeRabbit triage summary: blocking / follow-up / false positive. +- Any risk not covered by tests. +- Final label action. + +If verification passes, Heimdall replaces `verification-needed` with `verified`. +If it fails, Heimdall replaces `verification-needed` with `verification-failed` +and posts exact reproduction steps. Heimdall must not request merge. + +--- + +## 7. CodeRabbit Protocol + +CodeRabbit is part of review signal, not governance. + +Required handling: + +1. Wait for the CodeRabbit check or review comment when it appears on a PR. +2. Read every CodeRabbit finding that applies to the current PR head. +3. Classify each finding: + - **Blocking:** correctness, security, public API drift, broken tests, + packaging/release risk, forbidden-territory drift, or real maintainability + issue inside the PR scope. + - **Follow-up:** valid but outside the issue scope or not worth expanding + the current PR. + - **False positive:** inaccurate, contradicted by tests, or based on a + misunderstanding of repo architecture. +4. Blocking findings must be fixed by Gilfoyle before `verified`. +5. Follow-up findings may become new issues if Vision agrees. +6. False positives should be acknowledged in Heimdall or Vision's review + summary so Aymen does not have to re-triage them. + +CodeRabbit cannot: + +- Override the canonical validation gate. +- Approve a PR. +- Request merge. +- Bypass Code Owner review. +- Expand an issue's scope. + +If CodeRabbit is unavailable or delayed, Vision may proceed after Heimdall +verification, but the PR summary must explicitly say CodeRabbit was unavailable +or still pending. Do not pretend a missing review is green. + +--- + +## 8. Automation Mode + +Initial v0.3.0 execution should be manual-triggered, not recurring cron. + +Recommended launch sequence: + +1. Merge the planning PR. +2. Confirm branch protection and labels. +3. Vision labels only one starter issue `agent-ready`. +4. Manually run Gilfoyle once. +5. Manually run Heimdall on the resulting PR. +6. Review the process, then decide whether to add short-lived crons. + +Recurring crons are allowed only after two clean manual cycles. If enabled, use +short-lived project-specific jobs with explicit repo names and delete them after +the milestone. Do not reuse Alto cron prompts or webhook relay assumptions. + +```mermaid +stateDiagram-v2 + [*] --> ManualOnly + ManualOnly --> LimitedCron: two clean manual cycles + LimitedCron --> ManualOnly: first protocol violation + LimitedCron --> Removed: milestone complete + ManualOnly --> Removed: queue paused +``` + +--- + +## 9. First Wave + +Start with the lowest-risk issues after the planning PR lands: + +1. README / PyPI / `glama.json` six-tool refresh. +2. PyYAML safe-loader audit. +3. ADR-006 serialization draft. +4. ADR-001 source adapters draft. + +Delay zstd cache work until the dependency and dictionary/context API are +explicitly resolved by a maintainer-prep change. Delay CPython SHA pinning until +the SECURITY.md prose boundary is clear. + +--- + +## 10. Stop Conditions + +Pause the forge and remove `agent-ready` from the queue if any of these happen: + +- A PR modifies forbidden territory without an explicit issue comment approving it. +- Gilfoyle works on more than one issue in a cycle. +- Heimdall verifies a different commit than the PR head. +- A PR is marked `verified` while a CodeRabbit blocking finding is unresolved. +- Any agent adds merge/approval language. +- Any job uses Alto/Shopify/Vercel-specific assumptions. +- The baseline canonical gate fails on `main`. + +When paused, Vision writes a short incident note and fixes the protocol before +new work resumes. Small pauses are cheaper than turning a public repo into a +committee-authored incident report. diff --git a/STRATEGIC-ROADMAP-2026-05-29.md b/STRATEGIC-ROADMAP-2026-05-29.md new file mode 100644 index 0000000..3b75302 --- /dev/null +++ b/STRATEGIC-ROADMAP-2026-05-29.md @@ -0,0 +1,271 @@ +# Strategic Roadmap — python-docs-mcp-server + +**Adopted:** 2026-05-29 +**Status:** Active. This document is the canonical forward-looking strategy; review at each minor release. +**Supersedes / consolidates:** the four prior strategy artifacts listed in §7. + +--- + +## 1. Mission + +**The canonical, token-frugal Python stdlib oracle for AI coding agents — architected to be cloned.** + +Said longer: be the server AI coding agents reach for first when a Python stdlib question comes up, returning exact symbols, exact sections, and exact versions from CPython source itself — offline, always free, always MIT, token-efficient by design. Ship the architecture clearly enough that adopters can clone the pattern for other documentation ecosystems (Rust, Go, Node) without reinventing the design. + +Two audiences, one product: + +- **AI coding agents and their users.** Claude, Cursor, Codex, and the developers using them get a fast, deterministic, canonical answer to any Python stdlib question. +- **MCP authors building docs servers for other languages.** The project's ADRs, layered architecture, and (eventually) template repo make cloning the pattern a weekend's work instead of a quarter's. + +The label *"reference architecture"* is **not** claimed externally. If the writing earns it as a community verdict over 12 months, the label sticks for free; if it doesn't, the project is not on the hook for an overclaim. + +### 1.1 How we know we won + +| Signal | Target by v0.5.0 | Target by v1.0.0 | +|---|---|---| +| PyPI installs / month | 5,000 | 25,000 | +| GitHub stars | 1,000 | 5,000 | +| Citations by other MCP authors / blog posts | 5 | 25 | +| External adopter cloning the architecture for another language | 0 (acceptable) | ≥1 | +| Default-listed in at least one major coding agent's setup docs | 0 (acceptable) | ≥1 | +| Token / correctness benchmark cited as canonical for Python docs MCPs | 1 (own publication) | 3+ (third-party references) | + +These targets are deliberately aggressive on the high end and forgiving on the low end. The v1.0.0 numbers correspond to "credibly top-tier for the docs-MCP category"; the v0.5.0 numbers correspond to "moving in the right direction post-launch." + +--- + +## 2. Architectural Principles (locked) + +These are the principles future decisions must respect. Reopening any of them requires a deliberate amendment to this roadmap. + +| # | Principle | Why | +|---|---|---| +| 2.1 | **Canonical source only.** CPython at a pinned tag for stdlib docs; PyPI metadata API for package URLs. No scraped mirrors. No third-party indexers. | Correctness and version-accuracy are the moat. | +| 2.2 | **Offline-first runtime.** No network access at query time. The server is a local CDN edge over canonical docs. | Determinism, no rate limits, no API key surface. | +| 2.3 | **Always MIT, always free.** No paid tier, no closed-source extensions, no usage caps — ever. | Permanent positioning anchor (decision 5.1). | +| 2.4 | **Storage stays SQLite + markdown.** Storage format is closed; not re-openable in v0.x. | Universal, debuggable, greppable; FTS5 needs uncompressed text; markdown remains the right canonical body format for prose. | +| 2.5 | **Wire format is explicit and pluggable on structured tools only.** Compact JSON default; TOON opt-in if and only if the empirical study supports it. `get_docs` stays markdown. | Token economy is empirical, not architectural. | +| 2.6 | **Cache-first as a mental model.** Cold origin → warm index → hot derived-response cache → in-memory LRU. Every layer rebuildable from the layer above. | Justifies the architecture as "a CDN edge for docs in your editor." | +| 2.7 | **Layered design with stable contracts.** Eight layers: source connector / ingestion / storage / retrieval / budget / serializer / cache / transport. Contracts between layers are documented and stable. | What makes the pattern cloneable for other doc ecosystems. | +| 2.8 | **Strong trust posture.** MIT, OpenSSF Scorecard, CodeQL clean, attested releases via PyPI Trusted Publishing, build-time supply-chain threat model documented. | Differentiation vs cloud-first competitors who can't verify equivalently. | + +--- + +## 3. Where We Are (v0.2.1 baseline) + +**Shipped (2026-04 → 2026-05-29):** + +- PyPI publish path live (v0.1.5 → v0.1.6 → v0.2.0 → v0.2.1). +- Six MCP tools: `search_docs`, `get_docs`, `lookup_package_docs`, `list_versions`, `detect_python_version`, `compare_versions`. +- Python versions 3.10 – 3.14 indexed. +- Local SQLite + FTS5 index built from CPython source via `sphinx-build -b json`. +- Retrieved-docs cache, request-keyed, scoped to `index.db` fingerprint. +- Trusted Publishing with Sigstore attestations; OpenSSF Scorecard published; CodeQL clean. +- Proactive transitive bumps for CVE-2026-45409 (`idna` ReDoS) and PYSEC-2026-161 (`starlette` BadHost — explicitly affects MCP servers). +- Python 3.14 `fork`→`forkserver` regression patched (Sphinx parallel-build pickling issue). +- Positioning anchor: `.planning/POSITIONING.md` with per-surface adapter contract. + +**Not yet shipped (the road ahead):** + +- Empirical token study on Claude's tokenizer, with client-rewrap measurement. +- App-level zstd compression on the retrieved-docs cache. +- `format` parameter on the three structured tools. +- Architecture documentation (ADRs + design document). +- Public benchmark harness against all eligible docs MCPs. +- Personal blog + launch post. +- Phases 10 and 11 (`whatsnew_for_version`, `detect_python_version` v2 venv-aware). +- README / PyPI description refresh to reflect the 6-tool surface (still lists 5 in some surfaces). + +--- + +## 4. Milestone Roadmap + +Versioning follows semver. Behavior-additive changes (new tools, new optional parameters) are minor; bug fixes are patch. + +### v0.3.0 — Measurement, Compression, Hygiene *(target: 4 weeks)* + +The "instrument and tighten" release. Lays the empirical and operational foundation for everything that follows. **This is the most important release on the roadmap** because its outputs gate the v0.3.x and v0.5.0 decisions. + +| Deliverable | Notes | +|---|---| +| **Empirical token study** | One afternoon. Uses Anthropic's free token-counting API as the primary instrument (accepts the full structured-message envelope including tools). Measures both **token cost and serialization latency** per tool family. **Crucially measures client-side rewrap** — sends the same tool response through Claude Desktop / Cursor / Codex and observes what actually lands in the model context. Output: `docs/architecture/TOKEN-STUDY.md`. | +| **Workstream J — app-level zstd cache compression** | Targets retrieved-docs cache value column only. Trained dict on representative `get_docs` corpus. Codec column for forward-compat. Expected ratio strong because zstd's dictionary mode is documented as especially effective on small correlated records — exactly the cache-entry shape. | +| **30-minute TOON Python port audit** | Decides whether `format="toon"` is operationally viable in v0.3.x. If the port is unmaintained, ship JSON-only. | +| **README + PyPI + glama.json refresh** | Reflect the 6-tool surface including `compare_versions`. Adopt as a release-cycle discipline going forward (decision 5.8): every release updates the public-facing tool table. | +| **Build-time supply-chain hardening** | Pin CPython source by SHA, not by tag. Document the threat model in SECURITY.md (the `build-index` CPython clone is the largest non-runtime attack surface). Verify Sphinx-build environment isolation. | +| **PyYAML safe-loader audit** | `synonyms.yaml` is loaded at startup; confirm only `yaml.safe_load` is used; document the trust boundary. | +| **ADR-001 (Source Adapters) and ADR-006 (Serialization)** | First two of the eight ADRs. Establishes the layer-contract pattern. ADR-006 specifically enables the v0.3.x format parameter work. | + +### v0.3.x — Format Parameter *(timing: gated by v0.3.0 study)* + +The "selective serialization" release(s). Adds the `format` parameter to the three structured tools per locked decision 5.4. + +| Deliverable | Notes | +|---|---| +| `format` on `search_docs`, `list_versions`, `compare_versions` | JSON default. Always available. Existing clients see no behavior change unless they opt in. | +| `format="toon"` opt-in | **Only if** v0.3.0 study shows a meaningful token win on Claude's tokenizer **after client rewrap**, with acceptable latency. If the study fails this bar, the `format` parameter ships JSON-only and TOON is deferred indefinitely. | +| ADR-006 published as a standalone blog post | First post on the new blog. Anchors the personal brand on the architecture work. | + +### v0.4.0 — Phase 10 + Phase 11 *(target: 8 weeks after v0.3.0)* + +The "venv-aware" release. Adds the two remaining differentiating tools from the competitive brief. + +| Deliverable | Notes | +|---|---| +| `whatsnew_for_version(version)` | Section-sliced "What's New" page sourced from CPython `whatsnew/*.rst`. Reuses the multi-version index plumbing. | +| `detect_python_version` v2 (venv-aware) | Reads `VIRTUAL_ENV`, `.venv/pyvenv.cfg`, `pyproject.toml` `requires-python`, `.python-version`. Auto-routes subsequent queries to the detected version. | +| ADRs 2 – 5 | Ingestion, Storage, Retrieval, Budget. | + +### v0.5.0 — Architecture Documentation & Launch *(target: 12 weeks after v0.3.0)* + +The "design out loud" release. The architecture documentation becomes complete enough to support external adoption. + +| Deliverable | Notes | +|---|---| +| ADRs 7 and 8 | Cache, Transport. | +| `docs/architecture/DESIGN.md` | 5-page design document tying the ADRs together. | +| **Public benchmark harness** | All eight target docs MCPs + no-MCP baseline; 50-question Python eval covering symbols, concepts, cross-version, and PEP-adjacent. Reproducible from a clean clone. Methodology disclosure mandatory. | +| **Launch post: "Canonical Python stdlib for your AI agent"** | Lede is the `compare_versions` demo + benchmark headline. Cross-posted to dev.to and Show HN. Published on the personal blog (live since the ADR-006 post). | +| PyCon / EuroPython CFP submitted | Talk anchored on the architecture work. | + +### v1.0.0 — API Freeze *(target: ~6 months from now)* + +The "stable" release. Public API frozen; breaking changes would require v2. + +| Deliverable | Notes | +|---|---| +| API freeze across all tools | Semver discipline kicks in fully. | +| Deprecation policy + security disclosure docs | Lifecycle commitments visible. | +| `docs-mcp-template` (decision gate, §6 q1) | Ship **only if** at least one external adopter has signaled interest by v0.5.0. Otherwise defer indefinitely. | +| Optional Streamable HTTP transport (§6 q3) | Ship behind a flag if there is a clear remote-server use case by v0.5.0. The architectural separation already supports both. | + +--- + +## 5. Locked Decisions + +Consolidated from prior artifacts and this consolidation. + +| # | Decision | Source / Date | +|---|----------|---------------| +| 5.1 | Always MIT, always free, no paid tier ever. | Change request §9.5 (2026-05-14) | +| 5.2 | Repo rename to `python-stdlib-mcp` deliberately dropped in v0.1.5; revisit no earlier than v1.0. | Change request §9.2 reversed (2026-05-14) | +| 5.3 | Storage stays SQLite + markdown. TOON-as-storage killed. | Brainstorm §0.1 (2026-05-29) | +| 5.4 | Empirical Claude-tokenizer study gates the `format="toon"` decision. | Brainstorm §0.2 (2026-05-29) | +| 5.5 | `format` parameter on `search_docs`, `list_versions`, `compare_versions` only. JSON default; TOON opt-in. `get_docs` stays markdown. | Brainstorm §0.3 (2026-05-29) | +| 5.6 | "Reference architecture" label dropped externally; the writing work ships anyway. | Brainstorm §0.4 (2026-05-29) | +| 5.7 | App-level zstd on retrieved-docs cache, no gate. Versioned codec column for forward-compat. | Brainstorm §0.5 (2026-05-29) | +| 5.8 | Empirical study measures **client-side rewrap**, not just raw payload tokens. Uses Anthropic's free token-counting API as primary instrument. Reports **tokens AND latency** per tool family. | Deep-research integration (2026-05-29) | +| 5.9 | README / PyPI description / glama.json refresh to reflect the 6-tool surface; this becomes a release-cycle discipline going forward. | Deep-research integration (2026-05-29) | +| 5.10 | Build-time supply chain (the `build-index` CPython clone) is an explicit risk area; threat model documented in SECURITY.md; CPython source pinned by SHA. | Deep-research integration (2026-05-29) | +| 5.11 | PyYAML safe-loader-only discipline; `synonyms.yaml` is the only YAML input and is packaged with the wheel. | Deep-research integration (2026-05-29) | +| 5.12 | Autonomous agents work only via the issue-and-PR flow defined in `AGENT-EXECUTION-PIPELINE.md`. Direct commits to `main` are forbidden; auto-merge is forbidden. | Agent-pipeline addition (2026-05-29) | +| 5.13 | Forbidden-territory list in `AGENT-EXECUTION-PIPELINE.md` §2 is binding on all agents. | Agent-pipeline addition (2026-05-29) | +| 5.14 | Every agent-targetable issue must have a per-issue context file under `.planning/agent-context/.md`. | Agent-pipeline addition (2026-05-29) | + +--- + +## 6. Open Questions + +Not yet locked. Each should be resolved within the next 2 – 4 weeks. + +1. **`docs-mcp-template` ship/skip.** Adopt "defer; ship only if external adoption signals" (recommended), or commit now to building it by v0.5.0? +2. **Cross-tokenizer claims.** Run §3 study only on Claude's tokenizer (decisive for product behavior), or also GPT-5 and Gemini for comparative analysis in the design document? +3. **HTTP transport.** Stay stdio-only through v1.0 (recommended), or add a streamable HTTP adapter behind a flag in v0.5? Current architectural separation supports both; user-facing surface is bigger with HTTP. +4. **Pre-built index hosting.** Ship `python-docs-mcp-server install-index` that downloads a pre-built `index.db` from GitHub Release assets, so users skip the multi-minute Sphinx build? Worth doing in v0.3.0 if bandwidth cost is acceptable. + +--- + +## 7. Supporting Artifacts + +| Artifact | Role | Status | +|---|---|---| +| `AGENT-EXECUTION-PIPELINE.md` | Autonomous-agent policy, guardrails, validation gates, templates | Active; load-bearing for §9 | +| `OPENCLAW-FORGE-PROTOCOL.md` | OpenClaw role split for this MCP: Vision supervises, Gilfoyle implements, Heimdall verifies, Saga excluded by default because there is no UI | Active; operating layer for §9 | +| `competitive-brief.docx` | Original market positioning analysis (Context7, Ref.tools, arabold, DeepWiki, GitMCP, etc.) | External/private reference; not required in a clean checkout | +| `CHANGE-REQUEST-v0.1.5-launch.md` | Implementation plan for v0.1.5 launch — executed | External/private historical reference; rename dropped, otherwise complete | +| `ARCHITECTURE-BRAINSTORM-FEEDBACK-2026-05-29.md` | TOON / cache-first / reference-architecture brainstorm with the original locked decisions | External/private reference; superseded by §2 and §5 of this roadmap | +| Deep-research report (uploaded 2026-05-29) | Independent third-party audit; validation of locked decisions + §3 study methodology refinements | Folded into §4 v0.3.0 and §5.8 – 5.11 | +| `.planning/ROADMAP.md` | Engineering phase-by-phase plan (v0.1.0 execution) | Historical; phase 9 complete; 10 – 11 scaffolds active | +| `.planning/POSITIONING.md` | Per-surface adapter contract for the positioning sentence | Active; load-bearing | +| `CHANGELOG.md` | Keep-a-Changelog release history | Active | + +--- + +## 8. Next Three Concrete Moves + +1. **Run the empirical token study.** One afternoon. Anthropic token-counting API as the primary instrument; measures client-side rewrap by running the same tool response through Claude Desktop / Cursor / Codex; reports both tokens and latency per tool family. Output: `docs/architecture/TOKEN-STUDY.md`. +2. **Ship Workstream J (zstd cache).** Any free day. Trained dict on representative `get_docs` corpus; versioned codec column. +3. **Refresh README + PyPI + glama.json** to reflect the 6-tool surface. ~10-minute PR. Establishes the release-cycle discipline of decision 5.9. + +After those three, the v0.3.0 milestone is unlocked end-to-end and the v0.3.x format-parameter work can begin. + +--- + +## 9. Autonomous-Agent Execution + +A material portion of this roadmap will be executed by autonomous coding agents (Claude Code or similar) working unattended against GitHub issues. The execution policy, guardrails, forbidden territory, validation gates, and per-issue context-file requirements live in a companion document: + +[`AGENT-EXECUTION-PIPELINE.md`](AGENT-EXECUTION-PIPELINE.md) + +That file is **mandatory reading** before any agent-targetable issue is generated. It defines: + +- **Forbidden territory** (the don't-touch list — public API, schema, workflows, brand assets). +- **Issue structure** every agent-ready issue must contain. +- **Acceptance-criteria patterns** that are testable rather than vague. +- **The canonical validation gate** (ruff → pyright → pytest → doctor) that must pass before any PR. +- **Human-review triggers** that force a pause even when the agent thinks it's done. +- **Recovery procedures** when an agent gets stuck. +- **Per-issue context files** in `.planning/agent-context/` that give the agent everything it needs in one read. + +OpenClaw's concrete role split for this repo lives in: + +[`OPENCLAW-FORGE-PROTOCOL.md`](OPENCLAW-FORGE-PROTOCOL.md) + +Default execution is Vision → Gilfoyle → Heimdall → Vision/Aymen. Saga is not +part of the default loop because this MCP has no UI surface to review. + +### 9.1 Deliverable annotations + +Each v0.3.0 deliverable in §4 is classified by agent-friendliness: + +| Deliverable (v0.3.0) | Agent-friendly? | Lead | +|---|---|---| +| Workstream J — zstd cache codec | **Yes (high)** | Agent | +| README / PyPI / glama.json refresh to 6-tool surface | **Yes (high)** | Agent | +| PyYAML safe-loader audit | **Yes (medium)** | Agent | +| ADR-001 (Source Adapters) draft | **Yes (medium)** | Agent w/ strict template | +| ADR-006 (Serialization) draft | **Yes (medium)** | Agent w/ strict template | +| Build-time supply-chain: CPython SHA pin | **Yes (partial)** | Agent for the pin; human for SECURITY.md prose | +| 30-minute TOON Python port audit | **No** | Human (subjective quality judgment) | +| Empirical token study | **No** | Human (methodology + corpus selection); agent may scaffold the harness | + +The recommended overnight wave is the four high-confidence agent issues first — they produce obvious morning wins and de-risk the harder ones. + +### 9.2 Pre-flight before unleashing agents + +Before the first agent-ready issue is queued, the pre-flight checklist in `AGENT-EXECUTION-PIPELINE.md` §10 must be green. In particular: + +- `.github/CODEOWNERS`, `.github/ISSUE_TEMPLATE/autonomous-agent.yml`, and `.github/PULL_REQUEST_TEMPLATE/agent.md` must exist on `main`. +- Branch protection on `main` must require ≥1 human approval. +- The `🛑 needs-human-review` and `agent-ready` labels must exist. +- The canonical validation gate must pass on `main` from a clean clone. + +### 9.3 Additional locked decisions for the pipeline + +| # | Decision | +|---|----------| +| 5.12 | Autonomous agents work only via the issue-and-PR flow defined in `AGENT-EXECUTION-PIPELINE.md`. Direct commits to `main` are forbidden; auto-merge is forbidden. | +| 5.13 | The forbidden-territory list in `AGENT-EXECUTION-PIPELINE.md` §2 is binding. Any agent change touching those paths must pause for human review. | +| 5.14 | Every agent-targetable issue must have a per-issue context file under `.planning/agent-context/.md` so the agent reads one source of truth instead of fishing across `.planning/` archive material. | + +--- + +## 10. Review Triggers + +This roadmap is reviewed at: + +- Each minor release (v0.3.0, v0.4.0, v0.5.0, v1.0.0). +- Any material change in the MCP ecosystem (e.g., Anthropic ships first-party docs retrieval; Context7 announces a Python-stdlib mode; a competitor MCP cracks 10k stars). +- Owner's discretion when new external information arrives (e.g., another deep-research report; a sufficiently sharp critique from the community). + +Out-of-cycle amendments are tracked at the bottom of this file as `## Amendment YYYY-MM-DD` sections, preserving the original text. The locked-decisions table (§5) is the authoritative current state.