Skip to content

feat(contract): selfImprove forwards the full loop surface + fail-loud default (0.82.0)#226

Merged
drewstone merged 1 commit into
mainfrom
feat/selfimprove-complete-surface
Jun 5, 2026
Merged

feat(contract): selfImprove forwards the full loop surface + fail-loud default (0.82.0)#226
drewstone merged 1 commit into
mainfrom
feat/selfimprove-complete-surface

Conversation

@drewstone
Copy link
Copy Markdown
Contributor

Makes selfImprove a complete one-call surface, so a product agent can collapse its entire hand-rolled loop harness onto it with zero regression — the prerequisite for propagating the eval-campaign scaffold to the product agents.

Gap this closes

selfImprove dropped these when forwarding to runImprovementLoop: reps, promoteTopK, labeledStore, captureSource, expectUsage, analyzeGeneration, findings. A product collapsing onto it would silently lose labeled-example capture, replicates, and the analyst loop (EYES→HANDS), and run with a weaker backend-integrity guard. Now all are forwarded.

Fail-loud default

expectUsage now defaults to 'assert' (was effectively 'warn'). selfImprove is the real-run path, so a stub cell — produced an artifact but reported costUsd === 0 and zero tokens — fails loud rather than scoring a clean 0. Offline/replay callers opt out with expectUsage: 'off'.

Verification

  • pnpm typecheck clean
  • pnpm test1891 passed, 2 skipped (1889 prior + 2 new: the 'assert' default fails loud on a stub while 'off' resolves; analyzeGeneration fires between generations). The new default broke no other test — proof no other caller silently relied on a stub.
  • One offline test updated to expectUsage: 'off' (the only selfImprove caller; honest opt-out for a deterministic mock).

Release 0.82.0 (lockstep npm + PyPI). Tag v0.82.0 after merge publishes.

…d default

selfImprove dropped reps/promoteTopK/labeledStore/captureSource/expectUsage/
analyzeGeneration/findings when forwarding to runImprovementLoop — so a product
collapsing onto it would silently lose capture, replicates, and the analyst
loop, and would run with a weaker integrity guard. Forward all of them. A
product agent now collapses its entire loop harness onto one selfImprove call
with zero regression.

expectUsage now defaults to 'assert' (selfImprove is the real-run path; a stub
fails loud rather than scoring a clean 0). Offline callers set 'off'.

chore(release): 0.82.0 (lockstep npm + pyproject + python __version__).
Copy link
Copy Markdown
Contributor

@tangletools tangletools left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completes selfImprove as a one-call surface (forwards reps/promoteTopK/labeledStore/captureSource/expectUsage/analyzeGeneration/findings); fail-loud assert default. 1891 green, no other caller relied on a stub. Approving.

@drewstone drewstone merged commit 934bd65 into main Jun 5, 2026
1 check passed
@drewstone drewstone deleted the feat/selfimprove-complete-surface branch June 5, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants