Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ The parser rejects absolute paths, `..`, control characters, and writes outside

## Eval Boundary

Use `runKnowledgeBaseOptimization()` when comparing candidate knowledge bases on an actual task corpus. It delegates to `@tangle-network/agent-eval` multi-shot optimization, so single-turn and multi-turn agents share the same path.
Compare candidate knowledge bases on an actual task corpus by running an `@tangle-network/agent-eval` improvement loop (`runImprovementLoop`) over the variants; each run is scored into a `RunRecord`.

Use `knowledgeReleaseReportFromOptimization()` before promotion. It projects optimizer traces and `RunRecord` rows into `agent-eval` release confidence evidence.
Use `knowledgeReleaseReport()` before promotion. It folds the candidate and baseline `RunRecord[]` (plus optional traces and the gate decision) into `agent-eval` release confidence evidence.

## Integration Boundaries

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ Two ways in, depending on what you're doing:
- **Drive it from an agent** → pick the primitive by intent:
- *"Does the agent have enough context to run?"* → [`buildEvalKnowledgeBundle`](#agent-eval-integration) (block / ask / acquire before execution).
- *"Grow the KB as a researcher"* → [`runKnowledgeResearchLoop`](#research-loop) (deterministic mechanics; your agent owns judgment) or the sandbox [researcher profile](#researcher-profile) for `runLoop`.
- *"Does this candidate KB actually improve task success?"* → `runKnowledgeBaseOptimization` ([Agent-Eval integration](#agent-eval-integration)).
- *"Does this candidate KB actually improve task success?"* → run an [agent-eval improvement loop](#agent-eval-integration) over KB variants, then `knowledgeReleaseReport` for the promotion decision.
- *"Keep live authorities fresh"* → [pluggable sources](#pluggable-knowledge-sources) + `detectChanges` → eval re-runs.

Storage stays consumer-owned via `KbStore` (`MemoryKbStore`, `FileSystemKbStore`, or your own D1/Postgres). Every primitive below is source-grounded: claims cite immutable source records, and lint fails on un-grounded citations.
Expand Down Expand Up @@ -98,7 +98,7 @@ from `@tangle-network/agent-knowledge`.
hit *in this result set* (top hit = 1, others = score / topScore) — use it
when comparing against natural confidence thresholds. The normalization is
within-set ranking, not a cross-query absolute confidence.
- Optimization uses `@tangle-network/agent-eval` internally instead of reimplementing eval gates.
- Release confidence uses `@tangle-network/agent-eval` release gates (`evaluateReleaseConfidence`) instead of reimplementing them.
- `buildEvalKnowledgeBundle()` maps wiki/search evidence into
`agent-eval` `KnowledgeRequirement`, `KnowledgeBundle`, and
`KnowledgeReadinessReport` contracts so control loops can block, ask, or
Expand All @@ -108,9 +108,9 @@ The `/viz` subpath exports graph insight helpers without UI dependencies.

## Agent-Eval Integration

Use `runKnowledgeBaseOptimization()` when the question is whether a candidate knowledge base actually improves agent task success. The candidate is passed through `runMultiShotOptimization`, so `n=1` single-turn tasks and variable-length multi-turn traces use the same path.
To answer whether a candidate knowledge base actually improves agent task success, run an `@tangle-network/agent-eval` improvement loop (`runImprovementLoop`) over your KB variants on a real task corpus; each run is scored into a `RunRecord`.

Use `knowledgeReleaseReportFromOptimization()` to turn optimizer output into release confidence evidence using `agent-eval` release gates and `RunRecord` validation.
Use `knowledgeReleaseReport()` before promotion: pass the candidate and baseline `RunRecord[]` (plus optional `ReleaseTraceEvidence` and the gate decision) and it folds them into a `ReleaseConfidenceScorecard` and a `KnowledgeRelease` using `agent-eval`'s release gates and `RunRecord` validation.

Use `buildEvalKnowledgeBundle()` before execution when the question is whether
the agent has enough task-world context to run:
Expand Down
6 changes: 3 additions & 3 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ It does not try to be a vector database, a RAG framework, or a product-specific
- claims with source references
- deterministic indexing, graph construction, search, and lint
- safe LLM write proposals
- eval-gated optimization through `@tangle-network/agent-eval`
- eval-gated release confidence through `@tangle-network/agent-eval`
- visualization DTOs under the `/viz` subpath
- storage contracts with memory/filesystem reference adapters
- discovery worker/dispatcher contracts
Expand All @@ -18,7 +18,7 @@ It does not try to be a vector database, a RAG framework, or a product-specific

## Boundaries

`agent-eval` owns traces, ASI, multi-shot optimization, run records, and promotion gates.
`agent-eval` owns traces, ASI, improvement loops, run records, and promotion gates.

`agent-knowledge` owns sources, claims, pages, graph/search/lint, and knowledge base candidates. It calls `agent-eval` instead of reimplementing evaluation.

Expand All @@ -34,7 +34,7 @@ Core does not own a D1 schema or fleet dispatcher. Apps wire `KbStore` and `Know
4. Validate paths, citations, links, and schema.
5. Index generated knowledge pages.
6. Search and graph-lint the knowledge base.
7. Evaluate candidate KB variants with `runKnowledgeBaseOptimization`.
7. Evaluate candidate KB variants with an `agent-eval` improvement loop, then fold the resulting run records into release confidence with `knowledgeReleaseReport`.
8. Promote only variants that pass downstream gates.

## CLI
Expand Down
14 changes: 9 additions & 5 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@tangle-network/agent-knowledge",
"version": "1.5.2",
"version": "1.6.0",
"description": "Source-grounded, eval-gated knowledge growth primitives for agents.",
"homepage": "https://github.com/tangle-network/agent-knowledge#readme",
"repository": {
Expand Down Expand Up @@ -63,21 +63,25 @@
"format": "biome format --write src tests"
},
"dependencies": {
"@tangle-network/agent-eval": "^0.42.0",
"@tangle-network/agent-runtime": "^0.25.0",
"@tangle-network/agent-eval": "^0.77.0",
"@tangle-network/agent-runtime": "^0.44.0",
"zod": "^4.3.6"
},
"devDependencies": {
"@biomejs/biome": "^2.4.15",
"@tangle-network/sandbox": "^0.3.0",
"@tangle-network/sandbox": "^0.4.0",
"@types/node": "^25.6.0",
"tsup": "^8.0.0",
"typescript": "^5.7.0",
"vitest": "^3.0.0"
},
"pnpm": {
"minimumReleaseAge": 4320,
"minimumReleaseAgeExclude": []
"minimumReleaseAgeExclude": [
"@tangle-network/agent-eval",
"@tangle-network/agent-runtime",
"@tangle-network/sandbox"
]
},
"engines": {
"node": ">=20"
Expand Down
94 changes: 26 additions & 68 deletions pnpm-lock.yaml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 2 additions & 3 deletions src/release.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,8 @@ export interface KnowledgeReleaseReport {
* loop) supplies the candidate/baseline `RunRecord[]` (e.g. via
* `campaignToRunRecords`) + optional per-instance `ReleaseTraceEvidence` + the
* gate decision; this folds them into a `ReleaseConfidenceScorecard` + a
* `KnowledgeRelease`. Decoupled from any optimizer result shape — agent-eval's
* legacy multi-shot orchestration (and its `MultiShotOptimizationResult`) was
* removed in 0.42; release confidence is computed from records + traces.
* `KnowledgeRelease`. Release confidence is computed from run records + traces,
* independent of any optimizer result shape.
*/
export interface KnowledgeReleaseInput {
candidateId: string
Expand Down
Loading