feat(belief-state): add phase0 runtime measurement by drewstone · Pull Request #228 · tangle-network/agent-eval

drewstone · 2026-06-06T13:46:19Z

Summary

add buildRuntimeBeliefPhase0Measurement() for joining runtime producer decisions, lifecycle evidence, labels, and run split metadata
emit completed BeliefDecisionPoint rows plus a BeliefDecisionResearchEvidencePacket and coverage summary
keep missing labels/run joins diagnostic-only and keep OPE blocked when propensities are absent
keep the Phase 0 measurement test under tests/belief-state/ because it exercises cross-module join behavior rather than a single local unit

Verification

pnpm exec vitest run tests/belief-state/phase0-measurement.test.ts
pnpm exec vitest run src/belief-state
pnpm typecheck
pnpm lint
pnpm test
pnpm build
pnpm verify:package

Notes: pnpm lint exits cleanly with two pre-existing warnings outside this patch.

tangletools · 2026-06-06T13:55:31Z

✅ No Blockers — `ab2a6f7a`

Readiness 86/100 · Confidence 75/100 · 6 findings (6 low)

	deepseek	glm	aggregate
Readiness	86	86	86
Confidence	75	75	75
Correctness	86	86	86
Security	86	86	86
Testing	86	86	86
Architecture	86	86	86

Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision.

🟡 LOW split metadata requirement inconsistency with roadmap — .evolve/pursuits/2026-06-04-belief-state-agent-eval.md

Line 30: next-action promotion requirements dropped 'split metadata' from the list (old: '>= 200 labeled decision points, split metadata, integrity checks...', new: '>= 200 labeled decision points, integrity checks...'). However the roadmap at docs/research/belief-state-agent-eval-roadmap.md:228 still lists 'Every row has train/dev/holdout split' as a Phase 0 completion criterion. The code (phase0-measurement.ts:104,162) captures split metadata from RunRecord, but the research evidence gates (research-evidence.ts) do not currently enforce it. This may be intentional — reflecting that split is data-property not gate — but the inconsistency

🟡 LOW Test map row is aspirational, not verified — docs/research/belief-state-agent-eval-roadmap.md

Line 572 adds a test-map row for phase0-measurement.test.ts describing expected behavior ('joins runtime producer decisions...without fabricating missing joins or propensities'). This is a planning artifact — the test file exists in the PR's code changes (outside this shot's scope) but the roadmap assertion about what it tests is only as reliable as the test implementation. No action needed for a docs-only shot, but the global verifier should confirm the test file matches this description.

🟡 LOW Tests don't exercise label-to-point probability propagation — src/belief-state/phase0-measurement.test.ts

No test verifies that RuntimeBeliefDecisionLabel.behaviorProb or .targetProb propagate through to BeliefDecisionPoint.behaviorProb/.targetProb. The counterfactual test only asserts their absence. The underlying runtimeDecisionPointToBeliefDecisionPoint handles this correctly (tested in runtime-hooks.test.ts), but the Phase 0 integration path isn't covered. Add a test with labels carrying behaviorProb=0.3 and targetProb=0.5 and assert withBehaviorProb and withTargetProb summary counts.

🟡 LOW Loose options spread passes runtime fields to downstream packet builder — src/belief-state/phase0-measurement.ts

Line 127-130: buildBeliefDecisionResearchEvidencePacket({ ...options, points }) spreads all BuildRuntimeBeliefPhase0MeasurementOptions fields including runs, decisions, events, labels into the packet builder, which passes them to analyzeBeliefDecisionCorpus. JS ignores unknown properties at runtime so no functional bug, but future field name collisions and auditability suffer. Fix: destructure only the fields that BuildBeliefDecisionResearchEvidencePacketOptions accepts.

🟡 LOW compactMetadata duplicated across phase0-measurement.ts and runtime-hooks.ts — src/belief-state/phase0-measurement.ts

Lines 175-178 define compactMetadata identically to runtime-hooks.ts:381-384. Same signature, same filter-from-entries logic. Should be extracted to a shared internal utility (e.g. ./internal/compact-metadata.ts) or re-exported from runtime-hooks. Minor DRY violation that increases maintenance surface.

🟡 LOW labelJoinRate conflates label join success with downstream validation failures — src/belief-state/phase0-measurement.ts

Line 157: labelJoinRate: ratio(points.length, producerDecisionCount). The numerator is points.length (points that passed ALL downstream validation via runtimeDecisionPointToBeliefDecisionPoint), not producerDecisionCount - missingRunRecordCount - missingLabelCount. If a label join succeeds but runtimeDecisionPointToBeliefDecisionPoint returns no point (e.g. chosenAction missing, unsupported kind), labelJoinRate drops without any diagnostic explaining the mismatch between missingLabelCount and the actual completed count. The runJoinRate on [line 156](https://github.com/tangle-network/agent-eval/blob/ab2a6f7ad415e3b4a6c0d76309be8a8750b7

_{tangletools · 2026-06-06T13:55:29Z · trace}

tangletools

✅ Approved — 6 non-blocking findings — `ab2a6f7a`

Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 3/3 planned shots over 5 changed files. Global verifier still owns final merge decision.

Full immutable report for this review: trace

Summary comment for this run: full summary

_{tangletools · 2026-06-06T13:55:29Z · immutable trace}

tangletools

✅ Refreshed approval after new commits — `bcf09d0f`

A previous trusted approval on this PR was invalidated by new commits.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: stale_approval_refresh · 2026-06-06T17:28:50Z}

feat(belief-state): add phase0 runtime measurement

ab2a6f7

tangletools previously approved these changes Jun 6, 2026

View reviewed changes

test(belief-state): move phase0 measurement test

bcf09d0

drewstone dismissed tangletools’s stale review via bcf09d0 June 6, 2026 17:28

tangletools approved these changes Jun 6, 2026

View reviewed changes

drewstone merged commit 4fbaa4f into main Jun 6, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(belief-state): add phase0 runtime measurement#228

feat(belief-state): add phase0 runtime measurement#228
drewstone merged 2 commits into
mainfrom
feat/belief-phase0-measurement

drewstone commented Jun 6, 2026 •

edited

Loading

Uh oh!

tangletools commented Jun 6, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

tangletools commented Jun 6, 2026

✅ No Blockers — ab2a6f7a

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Approved — 6 non-blocking findings — ab2a6f7a

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Refreshed approval after new commits — bcf09d0f

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

drewstone commented Jun 6, 2026 •

edited

Loading

✅ No Blockers — `ab2a6f7a`

✅ Approved — 6 non-blocking findings — `ab2a6f7a`

✅ Refreshed approval after new commits — `bcf09d0f`