From dd5ed7319c5285873a640add4c679847a0c5c612 Mon Sep 17 00:00:00 2001 From: Drew Stone Date: Wed, 3 Jun 2026 09:07:25 -0600 Subject: [PATCH] chore(deps): bump substrate to agent-eval ^0.77.0 / agent-runtime ^0.44.0 + docs to live API MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit agent-knowledge was caret-trapped at agent-eval ^0.42.0 (>=0.42.0 <0.43.0, 35 minors behind 0.77.0) and agent-runtime ^0.25.0 (19 behind 0.44.0), with an empty minimumReleaseAgeExclude that froze it under the 3-day install cooldown. - bump @tangle-network/agent-eval ^0.42.0 -> ^0.77.0, agent-runtime ^0.25.0 -> ^0.44.0, sandbox (dev) ^0.3.0 -> ^0.4.0 - exclude the first-party @tangle-network substrate from minimumReleaseAge so freshly-published substrate installs (matches the agent-runtime/tax/creative convention; the age gate is for third-party supply-chain risk, not our own) - version 1.5.2 -> 1.6.0 The substrate imports are type-only (AnalystFinding/AnalystSeverity) plus the agent-runtime/loops subpath, all unchanged in 0.77/0.44 — typecheck EXIT=0, 99 tests pass, build clean with zero source edits. Docs referenced removed/never-shipped functions (runKnowledgeBaseOptimization, knowledgeReleaseReportFromOptimization, runMultiShotOptimization). Rewritten to the shipped contract: run an agent-eval improvement loop over KB variants, then knowledgeReleaseReport() folds the candidate/baseline RunRecords into release confidence. README, AGENTS, architecture.md, and the release.ts docstring now describe only the current API. --- AGENTS.md | 4 +- README.md | 8 ++-- docs/architecture.md | 6 +-- package.json | 14 ++++--- pnpm-lock.yaml | 94 ++++++++++++-------------------------------- src/release.ts | 5 +-- 6 files changed, 46 insertions(+), 85 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index e33ec89..08fbddd 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -66,9 +66,9 @@ The parser rejects absolute paths, `..`, control characters, and writes outside ## Eval Boundary -Use `runKnowledgeBaseOptimization()` when comparing candidate knowledge bases on an actual task corpus. It delegates to `@tangle-network/agent-eval` multi-shot optimization, so single-turn and multi-turn agents share the same path. +Compare candidate knowledge bases on an actual task corpus by running an `@tangle-network/agent-eval` improvement loop (`runImprovementLoop`) over the variants; each run is scored into a `RunRecord`. -Use `knowledgeReleaseReportFromOptimization()` before promotion. It projects optimizer traces and `RunRecord` rows into `agent-eval` release confidence evidence. +Use `knowledgeReleaseReport()` before promotion. It folds the candidate and baseline `RunRecord[]` (plus optional traces and the gate decision) into `agent-eval` release confidence evidence. ## Integration Boundaries diff --git a/README.md b/README.md index 251440e..fb2f8dd 100644 --- a/README.md +++ b/README.md @@ -29,7 +29,7 @@ Two ways in, depending on what you're doing: - **Drive it from an agent** → pick the primitive by intent: - *"Does the agent have enough context to run?"* → [`buildEvalKnowledgeBundle`](#agent-eval-integration) (block / ask / acquire before execution). - *"Grow the KB as a researcher"* → [`runKnowledgeResearchLoop`](#research-loop) (deterministic mechanics; your agent owns judgment) or the sandbox [researcher profile](#researcher-profile) for `runLoop`. - - *"Does this candidate KB actually improve task success?"* → `runKnowledgeBaseOptimization` ([Agent-Eval integration](#agent-eval-integration)). + - *"Does this candidate KB actually improve task success?"* → run an [agent-eval improvement loop](#agent-eval-integration) over KB variants, then `knowledgeReleaseReport` for the promotion decision. - *"Keep live authorities fresh"* → [pluggable sources](#pluggable-knowledge-sources) + `detectChanges` → eval re-runs. Storage stays consumer-owned via `KbStore` (`MemoryKbStore`, `FileSystemKbStore`, or your own D1/Postgres). Every primitive below is source-grounded: claims cite immutable source records, and lint fails on un-grounded citations. @@ -98,7 +98,7 @@ from `@tangle-network/agent-knowledge`. hit *in this result set* (top hit = 1, others = score / topScore) — use it when comparing against natural confidence thresholds. The normalization is within-set ranking, not a cross-query absolute confidence. -- Optimization uses `@tangle-network/agent-eval` internally instead of reimplementing eval gates. +- Release confidence uses `@tangle-network/agent-eval` release gates (`evaluateReleaseConfidence`) instead of reimplementing them. - `buildEvalKnowledgeBundle()` maps wiki/search evidence into `agent-eval` `KnowledgeRequirement`, `KnowledgeBundle`, and `KnowledgeReadinessReport` contracts so control loops can block, ask, or @@ -108,9 +108,9 @@ The `/viz` subpath exports graph insight helpers without UI dependencies. ## Agent-Eval Integration -Use `runKnowledgeBaseOptimization()` when the question is whether a candidate knowledge base actually improves agent task success. The candidate is passed through `runMultiShotOptimization`, so `n=1` single-turn tasks and variable-length multi-turn traces use the same path. +To answer whether a candidate knowledge base actually improves agent task success, run an `@tangle-network/agent-eval` improvement loop (`runImprovementLoop`) over your KB variants on a real task corpus; each run is scored into a `RunRecord`. -Use `knowledgeReleaseReportFromOptimization()` to turn optimizer output into release confidence evidence using `agent-eval` release gates and `RunRecord` validation. +Use `knowledgeReleaseReport()` before promotion: pass the candidate and baseline `RunRecord[]` (plus optional `ReleaseTraceEvidence` and the gate decision) and it folds them into a `ReleaseConfidenceScorecard` and a `KnowledgeRelease` using `agent-eval`'s release gates and `RunRecord` validation. Use `buildEvalKnowledgeBundle()` before execution when the question is whether the agent has enough task-world context to run: diff --git a/docs/architecture.md b/docs/architecture.md index 7bbfa56..6800f7c 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -9,7 +9,7 @@ It does not try to be a vector database, a RAG framework, or a product-specific - claims with source references - deterministic indexing, graph construction, search, and lint - safe LLM write proposals -- eval-gated optimization through `@tangle-network/agent-eval` +- eval-gated release confidence through `@tangle-network/agent-eval` - visualization DTOs under the `/viz` subpath - storage contracts with memory/filesystem reference adapters - discovery worker/dispatcher contracts @@ -18,7 +18,7 @@ It does not try to be a vector database, a RAG framework, or a product-specific ## Boundaries -`agent-eval` owns traces, ASI, multi-shot optimization, run records, and promotion gates. +`agent-eval` owns traces, ASI, improvement loops, run records, and promotion gates. `agent-knowledge` owns sources, claims, pages, graph/search/lint, and knowledge base candidates. It calls `agent-eval` instead of reimplementing evaluation. @@ -34,7 +34,7 @@ Core does not own a D1 schema or fleet dispatcher. Apps wire `KbStore` and `Know 4. Validate paths, citations, links, and schema. 5. Index generated knowledge pages. 6. Search and graph-lint the knowledge base. -7. Evaluate candidate KB variants with `runKnowledgeBaseOptimization`. +7. Evaluate candidate KB variants with an `agent-eval` improvement loop, then fold the resulting run records into release confidence with `knowledgeReleaseReport`. 8. Promote only variants that pass downstream gates. ## CLI diff --git a/package.json b/package.json index 9be7f92..67054ca 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "@tangle-network/agent-knowledge", - "version": "1.5.2", + "version": "1.6.0", "description": "Source-grounded, eval-gated knowledge growth primitives for agents.", "homepage": "https://github.com/tangle-network/agent-knowledge#readme", "repository": { @@ -63,13 +63,13 @@ "format": "biome format --write src tests" }, "dependencies": { - "@tangle-network/agent-eval": "^0.42.0", - "@tangle-network/agent-runtime": "^0.25.0", + "@tangle-network/agent-eval": "^0.77.0", + "@tangle-network/agent-runtime": "^0.44.0", "zod": "^4.3.6" }, "devDependencies": { "@biomejs/biome": "^2.4.15", - "@tangle-network/sandbox": "^0.3.0", + "@tangle-network/sandbox": "^0.4.0", "@types/node": "^25.6.0", "tsup": "^8.0.0", "typescript": "^5.7.0", @@ -77,7 +77,11 @@ }, "pnpm": { "minimumReleaseAge": 4320, - "minimumReleaseAgeExclude": [] + "minimumReleaseAgeExclude": [ + "@tangle-network/agent-eval", + "@tangle-network/agent-runtime", + "@tangle-network/sandbox" + ] }, "engines": { "node": ">=20" diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index a35beb4..d63dcfc 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -9,11 +9,11 @@ importers: .: dependencies: '@tangle-network/agent-eval': - specifier: ^0.42.0 - version: 0.42.0(@tangle-network/agent-runtime@0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3))(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3) + specifier: ^0.77.0 + version: 0.77.0(@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3) '@tangle-network/agent-runtime': - specifier: ^0.25.0 - version: 0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3) + specifier: ^0.44.0 + version: 0.44.0(@tangle-network/agent-eval@0.77.0(@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3))(@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2))) zod: specifier: ^4.3.6 version: 4.4.2 @@ -22,8 +22,8 @@ importers: specifier: ^2.4.15 version: 2.4.15 '@tangle-network/sandbox': - specifier: ^0.3.0 - version: 0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)) + specifier: ^0.4.0 + version: 0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)) '@types/node': specifier: ^25.6.0 version: 25.6.0 @@ -451,47 +451,29 @@ packages: '@scure/bip39@2.2.0': resolution: {integrity: sha512-T/Bj/YvYMNkIPq6EENO6/rcs2e7qTNuyoUXf0KBFDmp0ZDu0H2X4Lq6yC3i0c8PcWkov5EbW+yQZZbdMmk154A==} - '@tangle-network/agent-eval@0.40.5': - resolution: {integrity: sha512-ew27fDkzvYcM/3/u6Jx1HGS3/bPoIWAXKGa/2XlOro2hBwMA/h37SAHg4ytUDMd2M0mAKQAAanUxnHfkt/aklw==} + '@tangle-network/agent-eval@0.77.0': + resolution: {integrity: sha512-VPqvOnafimhe9QoL2EL/VdtoF/muSEhfKmAfa0hoXScz8zkIsijTD0AL0SmrfqKTOi1j+9O7tmEXmz4sWJFl+w==} engines: {node: '>=20'} hasBin: true peerDependencies: - '@tangle-network/agent-runtime': ^0.21.0 - '@tangle-network/sandbox': ^0.2.1 + '@tangle-network/sandbox': '>=0.2.1 <0.5.0' peerDependenciesMeta: - '@tangle-network/agent-runtime': - optional: true '@tangle-network/sandbox': optional: true - '@tangle-network/agent-eval@0.42.0': - resolution: {integrity: sha512-gJFT1Vm5LYDHtIF0BUqGq6i3Qa9IvFr3EvTfAE1CYjErFNl3TohL1sduJqj1GXIhDbswVVuWp5qaahHZHaIsbA==} - engines: {node: '>=20'} - hasBin: true - peerDependencies: - '@tangle-network/agent-runtime': ^0.21.0 - '@tangle-network/sandbox': ^0.2.1 - peerDependenciesMeta: - '@tangle-network/agent-runtime': - optional: true - '@tangle-network/sandbox': - optional: true - - '@tangle-network/agent-integrations@0.29.0': - resolution: {integrity: sha512-Avn4oBDTRP5v/3o1xq++uu/9+Rhl2hscIggeFPBGjtVYwhvbsSZL9pRrF3LfjqL9rjx9AocZOdsZC6MXrxKnkg==} - engines: {node: '>=20'} - hasBin: true - - '@tangle-network/agent-runtime@0.25.0': - resolution: {integrity: sha512-8snUNiDIb/9aeLDZPyf1O1gdOTQ9CV4nXDoULwE0xoibG8c0Ob6eRJw6wmDcMlDYVwTQr2gkq/mwWuuJ+GfaNQ==} + '@tangle-network/agent-runtime@0.44.0': + resolution: {integrity: sha512-uMzWcziIV+SsgvdvvnnSobaFYZuYXQ3KRfvq9h9kHglVLtPoUGH78ypCnyn5QQIccTk4gjSenAHC5Iy076DkQg==} engines: {node: '>=20'} hasBin: true peerDependencies: + '@tangle-network/agent-eval': '>=0.61.0 <1.0.0' '@tangle-network/agent-knowledge': '>=1.3.0 <2.0.0' - '@tangle-network/sandbox': '>=0.1.2 <0.3.0' + '@tangle-network/sandbox': '>=0.1.2 <0.5.0' peerDependenciesMeta: '@tangle-network/agent-knowledge': optional: true + '@tangle-network/sandbox': + optional: true '@tangle-network/sandbox@0.1.2': resolution: {integrity: sha512-6TPH9QgCgou9Bhc1kzLNL4/PRiT1mjId6NONY5Le/KT2kh77cXH8KN3TTY/cU+/eW+WM5FYJOy32FWl2HShXbw==} @@ -501,12 +483,12 @@ packages: viem: optional: true - '@tangle-network/sandbox@0.3.0': - resolution: {integrity: sha512-KfgvKhsUaOpkJe3AD19w7s4hdQekBlXQGoNx0xS4u6vuQk5YnFzBgv+EQeHCkkgETpYOWS2AN+6u/JhSyWStMw==} + '@tangle-network/sandbox@0.4.3': + resolution: {integrity: sha512-6QE3Nuhkd8f+OlpRJbumHTAG4wKR+ESXT47UE0fjTf7ndRWLnhE4RZ7YRtHVo/Q9ZZr0FGH1mwM+6tW0NAT1bA==} peerDependencies: - '@mastra/core': '*' - '@modelcontextprotocol/sdk': '*' - ai: '*' + '@mastra/core': ^1.36.0 + '@modelcontextprotocol/sdk': ^1.29.0 + ai: ^6.0.175 openai: ^6.36.0 viem: ^2.0.0 peerDependenciesMeta: @@ -1261,7 +1243,7 @@ snapshots: '@noble/hashes': 2.2.0 '@scure/base': 2.2.0 - '@tangle-network/agent-eval@0.40.5(@tangle-network/agent-runtime@0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3))(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3)': + '@tangle-network/agent-eval@0.77.0(@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3)': dependencies: '@asteasolutions/zod-to-openapi': 8.5.0(zod@4.4.2) '@ax-llm/ax': 19.0.45(zod@4.4.2) @@ -1270,47 +1252,23 @@ snapshots: hono: 4.12.16 zod: 4.4.2 optionalDependencies: - '@tangle-network/agent-runtime': 0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3) - '@tangle-network/sandbox': 0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)) + '@tangle-network/sandbox': 0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)) transitivePeerDependencies: - bufferutil - typescript - utf-8-validate - '@tangle-network/agent-eval@0.42.0(@tangle-network/agent-runtime@0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3))(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3)': + '@tangle-network/agent-runtime@0.44.0(@tangle-network/agent-eval@0.77.0(@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3))(@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))': dependencies: - '@asteasolutions/zod-to-openapi': 8.5.0(zod@4.4.2) - '@ax-llm/ax': 19.0.45(zod@4.4.2) - '@hono/node-server': 2.0.1(hono@4.12.16) - '@tangle-network/tcloud': 0.4.6(typescript@5.9.3)(zod@4.4.2) - hono: 4.12.16 - zod: 4.4.2 + '@tangle-network/agent-eval': 0.77.0(@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3) optionalDependencies: - '@tangle-network/agent-runtime': 0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3) - '@tangle-network/sandbox': 0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)) - transitivePeerDependencies: - - bufferutil - - typescript - - utf-8-validate - - '@tangle-network/agent-integrations@0.29.0': {} - - '@tangle-network/agent-runtime@0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3)': - dependencies: - '@tangle-network/agent-eval': 0.40.5(@tangle-network/agent-runtime@0.25.0(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3))(@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)))(typescript@5.9.3) - '@tangle-network/sandbox': 0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)) - transitivePeerDependencies: - - bufferutil - - typescript - - utf-8-validate + '@tangle-network/sandbox': 0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2)) '@tangle-network/sandbox@0.1.2(viem@2.48.8(typescript@5.9.3)(zod@4.4.2))': optionalDependencies: viem: 2.48.8(typescript@5.9.3)(zod@4.4.2) - '@tangle-network/sandbox@0.3.0(viem@2.48.8(typescript@5.9.3)(zod@4.4.2))': - dependencies: - '@tangle-network/agent-integrations': 0.29.0 + '@tangle-network/sandbox@0.4.3(viem@2.48.8(typescript@5.9.3)(zod@4.4.2))': optionalDependencies: viem: 2.48.8(typescript@5.9.3)(zod@4.4.2) diff --git a/src/release.ts b/src/release.ts index 14013f4..ad4b710 100644 --- a/src/release.ts +++ b/src/release.ts @@ -21,9 +21,8 @@ export interface KnowledgeReleaseReport { * loop) supplies the candidate/baseline `RunRecord[]` (e.g. via * `campaignToRunRecords`) + optional per-instance `ReleaseTraceEvidence` + the * gate decision; this folds them into a `ReleaseConfidenceScorecard` + a - * `KnowledgeRelease`. Decoupled from any optimizer result shape — agent-eval's - * legacy multi-shot orchestration (and its `MultiShotOptimizationResult`) was - * removed in 0.42; release confidence is computed from records + traces. + * `KnowledgeRelease`. Release confidence is computed from run records + traces, + * independent of any optimizer result shape. */ export interface KnowledgeReleaseInput { candidateId: string