feat(contract): selfImprove forwards the full loop surface + fail-loud default (0.82.0) by drewstone · Pull Request #226 · tangle-network/agent-eval

drewstone · 2026-06-05T23:23:34Z

Makes selfImprove a complete one-call surface, so a product agent can collapse its entire hand-rolled loop harness onto it with zero regression — the prerequisite for propagating the eval-campaign scaffold to the product agents.

Gap this closes

selfImprove dropped these when forwarding to runImprovementLoop: reps, promoteTopK, labeledStore, captureSource, expectUsage, analyzeGeneration, findings. A product collapsing onto it would silently lose labeled-example capture, replicates, and the analyst loop (EYES→HANDS), and run with a weaker backend-integrity guard. Now all are forwarded.

Fail-loud default

expectUsage now defaults to 'assert' (was effectively 'warn'). selfImprove is the real-run path, so a stub cell — produced an artifact but reported costUsd === 0 and zero tokens — fails loud rather than scoring a clean 0. Offline/replay callers opt out with expectUsage: 'off'.

Verification

pnpm typecheck clean
pnpm test — 1891 passed, 2 skipped (1889 prior + 2 new: the 'assert' default fails loud on a stub while 'off' resolves; analyzeGeneration fires between generations). The new default broke no other test — proof no other caller silently relied on a stub.
One offline test updated to expectUsage: 'off' (the only selfImprove caller; honest opt-out for a deterministic mock).

Release 0.82.0 (lockstep npm + PyPI). Tag v0.82.0 after merge publishes.

…d default selfImprove dropped reps/promoteTopK/labeledStore/captureSource/expectUsage/ analyzeGeneration/findings when forwarding to runImprovementLoop — so a product collapsing onto it would silently lose capture, replicates, and the analyst loop, and would run with a weaker integrity guard. Forward all of them. A product agent now collapses its entire loop harness onto one selfImprove call with zero regression. expectUsage now defaults to 'assert' (selfImprove is the real-run path; a stub fails loud rather than scoring a clean 0). Offline callers set 'off'. chore(release): 0.82.0 (lockstep npm + pyproject + python __version__).

tangletools

Completes selfImprove as a one-call surface (forwards reps/promoteTopK/labeledStore/captureSource/expectUsage/analyzeGeneration/findings); fail-loud assert default. 1891 green, no other caller relied on a stub. Approving.

tangletools approved these changes Jun 5, 2026

View reviewed changes

drewstone merged commit 934bd65 into main Jun 5, 2026
1 check passed

drewstone deleted the feat/selfimprove-complete-surface branch June 5, 2026 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(contract): selfImprove forwards the full loop surface + fail-loud default (0.82.0)#226

feat(contract): selfImprove forwards the full loop surface + fail-loud default (0.82.0)#226
drewstone merged 1 commit into
mainfrom
feat/selfimprove-complete-surface

drewstone commented Jun 5, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

drewstone commented Jun 5, 2026

Gap this closes

Fail-loud default

Verification

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants