Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions .coderabbit.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
language: "en-US"

reviews:
profile: "chill"
request_changes_workflow: false
high_level_summary: true
review_status: true
path_filters:
- "src/**"
- "tests/**"
Comment on lines +9 to +11
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Path filters exclude user-facing docs required by coding guidelines.

The path filters limit review to src/** and tests/**, but:

  1. The coding guideline requires *.md files (user-facing docs) to reflect current behavior.
  2. The src/** instruction on line 18 says to check if "changed source code makes those docs materially inaccurate," but that's impossible if .md files are filtered out.
  3. Config files like pyproject.toml are excluded, missing potential packaging-impact review.
  4. User-facing docs like README.md (referenced in issue #47 context) won't be reviewed.

Consider adding to path_filters:

   path_filters:
     - "src/**"
     - "tests/**"
+    - "*.md"
+    - "pyproject.toml"

And add corresponding path_instructions for *.md:

    - path: "*.md"
      instructions: |
        Focus review on whether user-facing documentation accurately reflects
        current behavior, especially when source code changes in this PR affect
        documented APIs, tool names, commands, or workflows.

As per coding guidelines, *.md: User-facing docs must reflect the current behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.coderabbit.yaml around lines 9 - 11, Update the .coderabbit.yaml
path_filters to include user-facing docs and important config files (e.g., add
"*.md" and "pyproject.toml" alongside "src/**" and "tests/**") and add a
corresponding path_instructions entry for "*.md" that instructs reviewers to
focus on whether user-facing documentation (e.g., README.md) accurately reflects
current behavior when source changes (referenced by src/**) affect APIs,
commands, or workflows; modify the path_filters and add the path: "*.md"
instructions block so the linter/reviewer will not skip docs and packaging
files.

path_instructions:
- path: "src/**"
instructions: |
Focus review on correctness, MCP tool behavior, runtime compatibility,
cache/index compatibility, packaging impact, and security boundaries.
Avoid comments about planning docs, release docs, or repository process
unless the changed source code makes those docs materially inaccurate.
- path: "tests/**"
instructions: |
Focus review on meaningful assertions, regression coverage, fixture
correctness, deterministic behavior, and avoiding network- or
environment-dependent tests unless the test is explicitly marked as an
integration smoke test.
49 changes: 49 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# CODEOWNERS — forces maintainer review on forbidden-territory paths.
#
# Source of truth: AGENT-EXECUTION-PIPELINE.md §2 (Forbidden Territory),
# required by §10 (Pre-flight Checklist).
#
# For these rules to be ENFORCED, branch protection on `main` must enable
# "Require review from Code Owners". CODEOWNERS alone only requests review;
# branch protection is what blocks merge.
#
# Autonomous agents may NOT modify these paths without explicit human approval
# (pipeline §2). Any agent PR touching them must add the `🛑 needs-human-review`
# label and stop short of requesting merge (pipeline §7).

# --- Project identity, dependencies, classifiers (only `version` is agent-editable) ---
/pyproject.toml @ayhammouda

# --- Permanent commitments and trust posture ---
/LICENSE @ayhammouda
/SECURITY.md @ayhammouda

# --- Load-bearing brand assets ---
/README.md @ayhammouda
/.planning/POSITIONING.md @ayhammouda

# --- Release history (adding entries is fine; rewriting history is not) ---
/CHANGELOG.md @ayhammouda

# --- CI/CD and supply chain (release path especially) ---
# The single /.github/ rule covers workflows and release.yml. Last-matching-
# pattern wins in CODEOWNERS — adding narrower entries with the same owner
# below would be no-ops and would silently *override* this rule if a different
# owner is ever added here, so we keep ownership of /.github/ uniform.
/.github/ @ayhammouda

# --- Index schema and migrations (rebuilds existing user indexes) ---
# NOTE: the retrieved-docs *cache* table lives in
# src/mcp_server_python_docs/services/persistent_cache.py and is NOT covered
# here — it is best-effort, fingerprint-scoped, and agent-editable per
# decision 5.7. Only the canonical *index* schema is forbidden territory.
**/storage/schema.sql @ayhammouda
**/migrations/ @ayhammouda

# --- Archival roadmap history ---
/.planning/ROADMAP.md @ayhammouda

# --- Governing policy + strategy documents ---
/AGENT-EXECUTION-PIPELINE.md @ayhammouda
/OPENCLAW-FORGE-PROTOCOL.md @ayhammouda
/STRATEGIC-ROADMAP-2026-05-29.md @ayhammouda
108 changes: 108 additions & 0 deletions .github/ISSUE_TEMPLATE/autonomous-agent.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
name: Autonomous Agent Task
description: A task spec scoped for unattended execution by an autonomous coding agent (Claude Code or similar).
title: "[vX.Y.Z] <scope> — <verb> <thing>"
body:
- type: markdown
attributes:
value: |
This template enforces the issue structure required by
`AGENT-EXECUTION-PIPELINE.md` §3 (in the repo root). An issue missing
any required section is **not** agent-ready and will not pass the §10
pre-flight checklist. Do not apply the `agent-ready` label from this
template; a maintainer applies it only after reading the completed
issue end-to-end. Read the pipeline doc and
`STRATEGIC-ROADMAP-2026-05-29.md` before filling this out.
- type: textarea
id: context
attributes:
label: Context (self-containment)
description: Link to the per-issue context file, this pipeline doc, the roadmap, any relevant ADR or `.planning/phases/0X-*` directory, and prior related issues.
value: |
- Per-issue context file: `.planning/agent-context/<issue-slug>.md` (read this first)
- Pipeline: `AGENT-EXECUTION-PIPELINE.md`
- Roadmap: `STRATEGIC-ROADMAP-2026-05-29.md` §<section>
- Related issues:
validations:
required: true
- type: textarea
id: goal
attributes:
label: Goal (one sentence)
description: The single outcome that counts as success.
validations:
required: true
- type: textarea
id: acceptance
attributes:
label: Acceptance criteria (testable checkbox list)
description: Each criterion must be testable, atomic, achievable without touching forbidden territory, and verifiable in <5 minutes (pipeline §4). Prefer exact commands and expected output.
value: |
- [ ] `<exact command>` <expected result>
- [ ] `<exact command>` <expected result>
validations:
required: true
- type: textarea
id: scope
attributes:
label: Scope boundaries
description: Explicit In scope / Out of scope. Out-of-scope work is a stop-and-comment trigger, never silent expansion.
value: |
**In scope:**
-

**Out of scope:**
-
validations:
required: true
- type: textarea
id: forbidden
attributes:
label: Forbidden-territory reminders
description: Repeat the AGENT-EXECUTION-PIPELINE.md §2 items relevant to THIS issue. If the task appears to require touching any of them, stop and comment.
validations:
required: true
- type: textarea
id: validation
attributes:
label: Validation commands (pipeline §5 gate)
description: The exact canonical gate, in order, plus any change-type-specific gates. Must pass before any PR is opened.
value: |
```bash
uv run ruff check src/ tests/
uv run pyright src/
uv run pytest --tb=short -q
uv run python-docs-mcp-server doctor
```
validations:
required: true
- type: textarea
id: pr-and-recovery
attributes:
label: PR requirements & recovery
description: What the PR description must include (pipeline §6) and where to go if blocked (pipeline §8).
value: |
- PR title matches this issue title verbatim; body uses
`.github/PULL_REQUEST_TEMPLATE/agent.md`.
- Branch: `agent/<issue-number>-<kebab-summary>`.
- If blocked: stop, write `WORKING-NOTES.md` on the branch, comment on
this issue per pipeline §8. **No PR, no auto-merge, ever.**
validations:
required: true
- type: input
id: effort
attributes:
label: Effort estimate (hours)
description: Rough hours. Agent must bail and escalate if work exceeds 2× this estimate (pipeline §8).
validations:
required: true
- type: checkboxes
id: acknowledgements
attributes:
label: Agent acknowledgements
options:
- label: I will work on a branch, never on `main`, and will not auto-merge.
required: true
- label: I will stop and comment rather than silently expand scope or touch forbidden territory.
required: true
- label: I will add `🛑 needs-human-review` if any pipeline §7 trigger fires.
required: true
42 changes: 42 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE/agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!--
Autonomous-agent PR template. Enforces AGENT-EXECUTION-PIPELINE.md §6.
PR title MUST match the issue title verbatim. Never request auto-merge.
-->

Closes #<issue-number>

## Acceptance criteria
<!-- Copy every criterion from the issue. Check the box only when satisfied,
and add one line of evidence (command + observed result) per item. -->
- [ ] <criterion 1> — <evidence>
- [ ] <criterion 2> — <evidence>

## Validation gate output
<!-- Paste the tail of each gate command. All must be green before opening this PR. -->
```text
$ uv run ruff check src/ tests/
$ uv run pyright src/
$ uv run pytest --tb=short -q
$ uv run python-docs-mcp-server doctor
Comment on lines +17 to +20
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Drop shell prompts from command-only example block.

Using $ without output triggers MD014 in markdownlint.

Proposed fix
-$ uv run ruff check src/ tests/
-$ uv run pyright src/
-$ uv run pytest --tb=short -q
-$ uv run python-docs-mcp-server doctor
+uv run ruff check src/ tests/
+uv run pyright src/
+uv run pytest --tb=short -q
+uv run python-docs-mcp-server doctor
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$ uv run ruff check src/ tests/
$ uv run pyright src/
$ uv run pytest --tb=short -q
$ uv run python-docs-mcp-server doctor
uv run ruff check src/ tests/
uv run pyright src/
uv run pytest --tb=short -q
uv run python-docs-mcp-server doctor
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)

[warning] 17-17: Dollar signs used before commands without showing output

(MD014, commands-show-output)


[warning] 18-18: Dollar signs used before commands without showing output

(MD014, commands-show-output)


[warning] 19-19: Dollar signs used before commands without showing output

(MD014, commands-show-output)


[warning] 20-20: Dollar signs used before commands without showing output

(MD014, commands-show-output)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/PULL_REQUEST_TEMPLATE/agent.md around lines 17 - 20, Remove the
leading shell prompt characters from the command-only example block so the lines
`uv run ruff check src/ tests/`, `uv run pyright src/`, `uv run pytest
--tb=short -q`, and `uv run python-docs-mcp-server doctor` appear without a `$ `
prefix; edit the code block in .github/PULL_REQUEST_TEMPLATE/agent.md to replace
each `$ uv ...` line with the corresponding command-only line to satisfy
markdownlint MD014.

```
<!-- Plus any change-type-specific gates from pipeline §5 (stdio smoke,
validate-corpus, uv lock --check) that applied to this change. -->

## CodeRabbit review
<!-- After CodeRabbit comments, summarize findings as:
- Blocking: <items or None>
- Follow-up: <items or None>
- False positive: <items or None>
If CodeRabbit has not run yet, write "Pending." Do not mark findings green
by silence. -->
Pending.

## Why this approach
<!-- One paragraph max. If the issue fully prescribed the approach, say so.
If you cite a design choice NOT in the issue, that is a §7 trigger. -->

## Why this triggered human review
<!-- List any pipeline §7 triggers and explain each. If none, write "None."
If any fired: this PR is opened for review only — do NOT request merge,
and ensure the `🛑 needs-human-review` label is applied. -->
None.
49 changes: 49 additions & 0 deletions .planning/agent-context/adr-001-source-adapters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Agent Context — ADR-001 (Source Adapters)

> One-read working context for issue `[v0.3.0] docs — write ADR-001 (Source Adapters)`.
> A **writing** task. Every claim must match the code — verify before you assert.

## 1. Roadmap excerpts (the principles you are recording)

- **Principle 2.1:** Canonical source only. CPython at a pinned tag for stdlib
docs; PyPI metadata API for package URLs. No scraped mirrors. No third-party indexers.
- **Principle 2.2:** Offline-first *runtime*. No network access at query time.
- **Principle 2.7:** Layered design with stable contracts; the **source
connector** is layer 1 of 8 and is what makes the pattern cloneable.

## 2. The two source adapters that exist today (describe these)

1. **CPython documentation source** (`src/mcp_server_python_docs/ingestion/`):
- `cpython_versions.py` — pinned build targets (`CPYTHON_DOCS_BUILD_CONFIG`:
per-version `tag` + `sphinx_pin`). Five versions: 3.10–3.14.
- `__main__.py` `build-index` path — `git clone --depth 1 --branch <tag>` of
`python/cpython`, builds docs with `sphinx-build -b json` in a dedicated venv.
- `sphinx_json.py` — parses the Sphinx JSON output into the index; also loads
`synonyms.yaml`. `inventory.py` — parses `objects.inv` for exact symbol resolution.
2. **PyPI metadata source** (`src/mcp_server_python_docs/services/package_docs.py`):
- Backs `lookup_package_docs`. A **controlled** PyPI metadata lookup
(`GET /pypi/<project>/json`) that returns only project/docs/homepage/source
URLs — not a generic web fetch, not scraped docs.

## 3. The one documented exception to "offline-first"

- `lookup_package_docs` performs a network call to PyPI's metadata API. This is
**not** a docs-*query*-time call against the canonical stdlib index — it is a
controlled, narrowly-scoped metadata lookup. The ADR must state this exception
explicitly so the offline-first invariant (2.2) stays honest. (See README's
"Why not Context7" section and `SECURITY.md` scope for the existing framing.)

## 4. Known pitfalls

- **Verify, don't assume.** Open each cited file and confirm the behavior before
writing it into the ADR. An ADR that misstates current behavior is worse than none.
- Don't document adapters that don't exist (Rust/Go) beyond a single "future
adopters clone this contract" sentence — that's the cloneability point, not a claim.
- No code, schema, or workflow changes — writing only.
- Keep it factual; "reference architecture" is not claimed externally (5.6).

## 5. Decision log

- File path:
- Claims you verified against code (file:line):
- Anything ambiguous about the layer contract that you flagged for the maintainer:
52 changes: 52 additions & 0 deletions .planning/agent-context/adr-006-serialization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Agent Context — ADR-006 (Serialization)

> One-read working context for issue `[v0.3.0] docs — write ADR-006 (Serialization)`.
> This is a **writing** task. You are recording locked decisions, not making new ones.

## 1. Roadmap excerpts (the decisions you are recording — verbatim)

- **Principle 2.5:** Wire format is explicit and pluggable on structured tools
only. Compact JSON default; TOON opt-in *if and only if* the empirical study
supports it. `get_docs` stays markdown. *Token economy is empirical, not architectural.*
- **Principle 2.7:** Layered design with stable contracts — eight layers, the
**serializer** being one of them.
- **Decision 5.3:** Storage stays SQLite + markdown. **TOON-as-storage killed.**
- **Decision 5.4:** Empirical Claude-tokenizer study **gates** the `format="toon"` decision.
- **Decision 5.5:** `format` parameter on `search_docs`, `list_versions`,
`compare_versions` **only**. JSON default; TOON opt-in. `get_docs` stays markdown.
- **Decision 5.8:** The study measures **client-side rewrap**, not just raw
payload tokens; reports tokens AND latency per tool family.

## 2. Code touch-points (for accuracy — describe, do NOT change)

- Tool results are Pydantic models in `src/mcp_server_python_docs/models.py`
(e.g. `GetDocsResult`); tools live in `server.py` and return those models,
which FastMCP serializes. The "serializer layer" is the conceptual seam where
a structured result becomes a wire string — that's what the `format` parameter
will eventually parameterize. You are documenting that seam, not building it.
- `get_docs` returns markdown content (`GetDocsResult.content`) — this is why it
is carved out of the `format` parameter (markdown is already the canonical body).

## 3. Pattern to follow

- There is no `docs/architecture/` ADR yet — you are establishing the house
style. Use the exact skeleton embedded in the issue. Keep it tight (1–2 pages).
- Number/name the file `docs/architecture/ADR-006-serialization.md` to match the
roadmap's ADR numbering (ADR-001 and ADR-006 are the first two written).

## 4. Known pitfalls

- **Do not invent.** If you find yourself making a serialization choice that is
not in §2 above, that's a pipeline §7 trigger ("cites a design choice not in
the issue") — stop and comment.
- **Do not implement `format`.** That is v0.3.x and is gated by the study.
- Don't claim a TOON token win — the study hasn't run. The ADR records that TOON
is *opt-in and gated*, with the bar being "win holds after client rewrap" (5.8).
- "Reference architecture" is **not** claimed externally (decision 5.6) — keep
the ADR factual, not promotional.

## 5. Decision log

- Final file path:
- Any wording you were unsure mapped to a locked decision (and how you resolved it):
- Open follow-ups (e.g. link to TOKEN-STUDY.md once it exists):
67 changes: 67 additions & 0 deletions .planning/agent-context/cpython-source-sha-pin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Agent Context — CPython source SHA pin

> One-read working context for issue `[v0.3.0] ingestion — pin CPython source by commit SHA`.
> PARTIAL issue: you do the pin + verification; the human writes the SECURITY.md prose.

## 1. Roadmap excerpt

> **Build-time supply-chain hardening** (roadmap §4, v0.3.0): Pin CPython source
> by SHA, not by tag. Document the threat model in SECURITY.md (the `build-index`
> CPython clone is the largest non-runtime attack surface). Verify Sphinx-build
> environment isolation.
>
> **Decision 5.10 (locked):** Build-time supply chain (the `build-index` CPython
> clone) is an explicit risk area; threat model documented in SECURITY.md;
> CPython source pinned by SHA.

## 2. Code touch-points

- `src/mcp_server_python_docs/ingestion/cpython_versions.py`
- `CPythonDocsBuildConfig(TypedDict)` — add `sha: str`.
- `CPYTHON_DOCS_BUILD_CONFIG` — five entries, currently `{"tag": ..., "sphinx_pin": ...}`:
`3.10→v3.10.20`, `3.11→v3.11.15`, `3.12→v3.12.13`, `3.13→v3.13.13`, `3.14→v3.14.4`.
Add the resolved SHA to each. Resolve with:
`git ls-remote https://github.com/python/cpython.git refs/tags/<tag>`
(use the dereferenced commit — the `<tag>^{}` line — not the annotated-tag object).
- `src/mcp_server_python_docs/__main__.py:210–226` — the clone:
`git clone --depth 1 --branch config["tag"] https://github.com/python/cpython.git <clone_dir>`.
After it, add: `rev = git -C <clone_dir> rev-parse HEAD`; if `rev != config["sha"]`,
log a clear error and **abort this version's build** (raise / skip-with-failure —
match the existing error-handling style in this function; do not silently continue).
- `tests/test_ingestion.py:53` — existing assertion
`config["tag"].startswith(f"v{version}.")`. Add a sibling assertion that
`config["sha"]` matches `^[0-9a-f]{40}$`.

## 3. Patterns to follow

- `tests/test_ingestion.py` iterates `CPYTHON_DOCS_BUILD_CONFIG.items()` for the
tag assertion — extend that same loop for the SHA assertion. No new fixtures.
- The clone block already uses `subprocess.run([...], check=True, capture_output=True, text=True)`
— reuse that idiom for the `rev-parse` call.

## 4. Known pitfalls

- **`--branch <tag>` cannot take a raw SHA** on a shallow clone against GitHub by
default. Keep the tag-based shallow fetch; make the **SHA a post-clone
verification gate**, not the fetch ref. That is the integrity win: a moved/re-tagged
tag now fails the build instead of silently changing canonical content.
- Use the **dereferenced commit SHA** (peeled tag), not the annotated tag object's
own SHA — `rev-parse HEAD` after checkout gives the commit; match that.
- **Do not edit `SECURITY.md`** (forbidden). Draft the threat-model paragraph in
the PR body + decision log below for a human to paste.
- A full `build-index` clones over the network and takes minutes — do not gate the
PR on it. The unit tests cover the config + verification logic offline.
- Don't bump any tag to a newer CPython point release; pin the SHA of the
**current** tag only.

## 5. Decision log

- Resolved SHAs (tag → 40-hex commit), one line each:
- 3.10 / v3.10.20 →
- 3.11 / v3.11.15 →
- 3.12 / v3.12.13 →
- 3.13 / v3.13.13 →
- 3.14 / v3.14.4 →
- Where/how the verification aborts on mismatch:
- **Draft SECURITY.md threat-model paragraph (for human to paste):**
>
Loading